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FOREWORD 

Are  four  different  colors  all  that  are  needed  in  making  a map?  This 
is  known  as  the  four  color  problem,  and  it  has  challenged  mathematicians 
for  many  years.  Recently,  with  the  aid  of  a computer,  this  problem 
has  been  solved  in  the  affirmative.  This  certainly  points  out  the  fact 
that  computers  are  powerful  tools,  and  one  might  be  led  to  the  wrong 
conclusion  about  their  ability  to  solve  all  the  problems  faced  by  scientists. 
David  Hilbert,  a famous  German  mathematician,  suggested  that  the  consistency 
of  any  set  of  axioms  could  be  proven  by  step-by-step  analysis.  In  1931, 
the  Austrian  mathematician,  Kurt  Godel,  proved  his  celebrated  imcompleteness 
theorem,  which  showed  that  not  all  problems  can  be  handled  in  this  manner. 
While  this  fact  may  be  a little  discouraging,  still  it  hasn't  slowed 
individuals  in  the  process  of  finding  applications  of  numerical  analysis 
and  the  use  of  computers.  One  has  only  to  scan  the  program  listed  on 
the  next  pages  to  realize  that  step-by-step  analysis  has  become  an 
indispensable  tool  for  treating  scientific  data. 

The  1978  Army  Numerical  Analysis  and  Computers  Conference  had  as  its 
host  the  U.  S.  Array  Missile  Research  and  Development  Command  at  Redstone 
Arsenal,  Alabama.  The  Army  Mathematics  Steering  Committee  (AMSC) , which 
sponsors  these  conferences,  was  pleased  to  have  one  of  its  members. 

Dr.  Siegfried  H.  Lehnlgk  serve  as  chairman  on  local  arrangements.  He 
has  served  in  a similar  capacity  for  two  other  Army-wide  conferences 
sponsored  by  the  AMSC.  In  1969,  Dr.  Lehnigk  took  on  the  responsibility 
for  the  15th  Conference  on  the  Design  of  Experiments  in  Army  Research, 
Development  and  Testing;  and  in  1971,  he  guided  the  local  arrangements 
for  the  17th  Conference  of  Army  Mathematicians.  Those  in  attendance 
at  each  of  these  meetings  were  certainly  fortunate  to  have  such  an  able 
gentlemen  looking  after  their  needs  while  away  from  their  home  offices. 

This  conference  continues  a series  of  meetings  held  annually  since  1959 
for  the  purposes  of  providing  a forum  for  the  exchange  of  information 
among  the  scientific  and  technical  staffs  of  Army  computation  centers,  and 
to  provide  the  AMSC  with  information  on  the  current  and  future  needs  of 
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the  Army  on  numerical  analysis  and  related  fields  of  mathematics  and 
computing. 

The  responsibility  for  organizing  these  conferences  rest  in  the  bands 
of  the  AMSC  Subcommittee  on  Numerical  Analysis  and  Computers.  Members 
of  this  committee  selected  "Software  Reliability"  as  the  theme  for  the 
meeting  at  Redstone  Arsenal.  Many  of  the  contributed  papers  as  well  as 
three  of  the  four  invited  speakers  listed  below  stressed  this  area  of 
scientific  endeavor. 


Speaker  and  Institution 


Title  of  Address 


Stephen  S.  Yau 
Northwestern  University 

Carl  de  Boor 

Mathematics  Research  Center 

Lloyd  Fosdick 
University  of  Colorado 

Sol  Greenspan 
University  of  Toronto 


On  Error  Resistant  Software 
Design 

Best  Nodes  for  Polynomial 
Interpolation 

Some  Algorithms  for  the  Analysis 
of  Computer  Programs 

Software  Structuring:  Concepts 
and  Methods 


Members  of  the  AMSC  are  taking  this  opportunity  to  express  their  thanks 
to  the  U.  S.  Army  Missile  Research  and  Development  Command  for  serving 
as  host  of  this  conference.  They  realize  that  all  the  speakers  spent  a 
lot  of  time  and  effort  in  preparing  and  presenting  their  papers,  and 
would  like  to  let  them  know  their  efforts  were  appreciated  by  all  those 
in  attendance  at  this  meeting.  Many  of  these  papers  appear  in  these 


Proceedings  so  that  the  important  results  they  contain  can  be  studied 
and  be  used  by  members  of  the  scientific  community. 
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ON  ERROR- RESISTANT  SOFTWARE  DESIGNS* 

Stephen  S.  Yau 

Department  of  Electrical  Engineering  and  Computer  Science 
Northwestern  University 
Evanston,  Illinois  60201 

ABSTRACT.  Error-resistant  software  is  a piece  of  software  which  can  resist 
the  adverse  effects  of  errors.  It  can  provide  high  reliability  and  continuous 
availability  of  computing  resource,  even  in  the  presence  of  software  errors  and 
hardware  faults.  This  feature  is  very  important  for  many  real-time  systems, 
such  as  missile  defense  systems,  military  communication  systems,  air  traffic 
control  systems,  and  telephone  switching  systems.  In  this  paper,  the  concept 
of  error- resistant  software  designs  is  introduced,  and  various  design  approach- 
es will  be  presented.  Further  research  and  development  in  this  area  will  also 
be  discussed. 


1.  INTRODUCTION.  Error-resistant  software  is  a piece  of  software  which  can 
resist  the  adverse  effects  of  errors  [ll.  In  order  to  accomplish  this  goal, 
the  program  must  possess  the  capabilities  to  detect  errors,  to  locate  and  con- 
tain the  propagation  of  errors,  and  to  recover  from  the  errors.  Many  existing 
software  systems  contain  errors  but  reliable  in  terms  of  producing  correct  re- 
sults. High  reliability  and  continuous  availability,  even  in  the  presence  of 
software  errors  and  hardware  faults,  are  very  important  for  many  real-time  sys- 
tems, such  as  missile  defense  systems,  military  communication  systems,  air- 
traffic  control  systems,  and  telephone  switching  systems. 

The  capabilities  of  error  detection,  error  containment  and  recovery  can  be 
implemented  by  self-checking  software  [2].  A piece  of  self-checking  software 
contains  software  redundancy  in  the  program  to  check  the  dynamic  behavior  for 
proper  operation  during  its  execution.  When  an  abnormal  behavior  is  detected, 
it  will  be  interpreted  as  an  error  and  the  error  will  be  isolated  to  minimize 
its  propagation.  A recovery  procedure  is  then  attempted  to  correct  the  abnorm- 
al behavior.  It  will  involve  the  repair  of  damage  caused  by  the  error  and  to 
correct  the  error  if  possible.  Some  transient  errors  can  be  corrected  by  re- 
peating the  operation.  Other  permanent  damage  can  be  repaired  by  reconstructing 
the  value  of  the  mutilated  items  from  redundant  information  stored  in  the  pro- 
gram or  from  a safe  backup  copy  stored  at  an  earlier  state.  Some  rollback  and 
retry  operations  are  usually  involved.  Some  errors  can  be  corrected  by  a dupli- 
cated copy  of  hardware  for  hardware  faults  or  a different  version  of  algorithm 
for  software  errors.  Human  intervention  may  be  necessary  sometimes. 


This  work  was  supported  by  the  U.S.  Army  Research  Office 
Grant  No.  DAAG-29-76-G-0183. 
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From  the  reliability  point  of  view,  a piece  of  well-designed  error-resist- 
ant software  should  detect  an  error  at  the  "earliest"  time,  contain  the  damaging 
effects  of  an  error  to  the  "smallest"  domain,  and  perform  the  most  "reliable" 
recovery.  This  requires  a great  deal  of  software  redundancy  (code  and  data). 
Although  a fast  recovery  is  guaranteed,  this  capability  will  also  cost  very  large 
overhead  during  the  normal  operation  of  the  program.  There  is  a tradeoff  between 
the  overhead  of  validation  checks  and  the  cost  of  recovery.  However,  with  the 
rapid  improvement  of  hardware  cost  and  speed,  many  of  the  validation  checks  can 
be  performed  concurrently  and  economically  with  the  normal  data  processing  opera- 
tions in  a multiprocessor  system.  Error-resistance  can  then  be  achieved  econom- 
ically with  very  little  cost  in  terms  of  real-time  overhead. 

In  this  paper,  we  will  present  various  approaches  to  error-resistant  soft- 
ware design,  and  discuss  further  research  and  development  in  this  area. 
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2.  TYPES  OF  ERRORS.  Software  can  be  roughly  partitioned  into  two  types, 
the  application  programs  and  the  operating  system  programs.  The  application 
programs  perform  the  processing  of  data  required  by  the  users.  The  operating 
system  acts  as  an  interface  between  the  hardware  and  the  application  programs. 

It  provides  the  user  with  a more  usable  virtual  machine.  It  is  responsible  for 
the  allocation  of  resources,  the  synchronization  of  the  physical  progress  of  the 
processes,  and  the  protection  of  the  integrity  of  the  processes  as  well  as  data 
stored  in  the  computer  system.  All  these  tasks  have  to  be  performed  efficiently 
in  terms  of  time  and  the  amount  of  resources  required. 

The  behavior  of  a program  may  be  considered  at  three  levels:  the  function 
performed  by  each  module  (module  level) , the  flow  of  control  and  data  between 
modules  (program  level),  and  the  interactions  that  a program  has  with  other  soft- 
ware components  of  the  system  (system  level).  If  the  normal  behavior  of  an  ap- 
plication program  at  each  of  these  three  levels  can  be  specified,  then  its  ex- 
ecution can  be  monitored  to  check  whether  it  is  performing  properly.  At  the 
module  level,  the  function  of  a module  is  to  perform  a transformation  from  its 
input  data  to  its  output  data  [3].  This  transformation  can  be  specified  by  des- 
cribing the  legitimate  values  of  the  input  data  objects,  and  the  corresponding 
output  values  that  a correctly  functioning  module  would  produce.  At  the  pro- 
gram level,  the  normal  flow  of  control  and  data  between  modules  in  a program 
can  be  specified  by  indicating  how  control  can  be  transferred  between  modules, 
under  what  circumstances  each  of  the  control  transfer  may  be  taken,  and  what 
data  should  be  passed  at  each  interface.  (We  assume  that  global  data  will  be 
passed  as  parameters,  at  least  conceptually.)  At  the  system  level,  the  inter- 
actions that  a program  is  allowed  to  have  with  its  external  environment  can  be 
specified  by  indicating  which  objects  the  program  is  authorized  to  interact 
with,  under  what  conditions  those  interactions  may  occur,  and  what  information 
should  be  communicated  in  those  interactions.  It  also  includes  other  aspects 
of  an  operating  system,  such  as  resource  allocation,  job  management  and  memory 
management.  Deviations  from  the  specified  behavior  at  one  of  the  levels  will 
be  detected  by  the  system  as  a software  error  or  hardware  fault  at  that  level. 

If  possible,  recovery  procedures  will  then  be  initiated  to  allow  continuation 
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of  normal  execution.  Con  ite  recovery  may  be  Impossible,  but  It  may  be  possible 
to  continue  in  a degraded  mode  of  operation  and  still  produce  useful  results. 

Since  errors  are  detected  as  deviations  from  the  normal  behavior  of  the 
program,  the  error  detection  code  is  designed  to  detect  both  software  errors  and 
hardware  faults.  In  other  words,  the  error  detection  routines  do  not  make  a 
sharp  distinction  among  the  origins  of  the  error,  whether  it  is  caused  by  a hard- 
ware or  software  failure.  There  are  several  reasons  for  this  consideration. 

As  hardware  becomes  more  complicated,  it  is  plagued  by  design  errors  as  well  as 
component  failures.  Under  these  circumstances,  hardware  and  software  errors 
are  indistinguishable.  We  have  to  try  to  cope  with  the  error  situation  without 
knowing  whether  the  error  is  caused  by  a program  bug  or  a hardware  fault.  Since 
software  errors  are  very  general  in  nature,  many  of  the  precautions  taken  against 
software  errors  will  be  useful  for  many  types  of  hardware  faults,  such  as  trans- 
ient faults  and  multiple  faults,  for  which  there  exists  no  effective  hardware 
diagnosis  technique  at  present.  It  is  not  always  advisable  to  depend  solely  on 
hardware  redundancy  to  achieve  hardware  fault-tolerance.  A piece  of  error-resis- 
tant software  should  resist  the  adverse  effects  of  any  error,  regardless  whether 
it  is  caused  by  software  or  hardware. 

We  will  assume  that  a separate  set  of  hardware  diagnostics  are  available  to 
verify  the  integrity  of  the  system  hardware  after  an  error  is  detected  and  that 
hardware  redundancy  is  available  to  enable  hardware  fault-tolerant  operation. 

In  general,  we  may  be  concerned  with  software  errors  of  the  following  categor- 
ies in  the  three  levels: 

Module  Level  Errors  which  cause  a module  to  perform  the  function  specified 
by  the  programmer  improperly.  These  errors  occur  when 

• a module  does  not  start  execution  because  the  input  data  to  the  module  do 
not  meet  its  input  specifications, 

• a module  starts  execution,  but  does  not  complete  execution  because  of  an 
error  condition  detected  by  the  hardware, e.g. , division  by  zero,  illegal 
opcode  or  address  out  of  range, 

, a module  does  complete  execution,  but  the  output  data  of  the  module  do  not 
meet  its  output  specifications. 

Program  Level  Errors  which  cause  the  flow  of  control  and  data  between  mod- 
ules in  a program  to  deviate  from  the  specifications  cf  the  programmer.  These 
errors  occur  when 

. there  is  a wrong  transfer  of  control  between  modules. 

. there  is  an  infinite  internal  loop  (within  a module), 

. there  is  an  infinite  external  loop  (between  modules). 
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System  Level  Krrors  which  cause  the  program  not  to  interact  with  objects 
in  the  system  in  the  manner  specified  by  the  progranwicr.  These  errors  occur 
when 


• t'hc  program  is  not  authorized  to  interact  in  the  manner  specified  by  the 
programmer , 

• the  program  is  authorized  to  interact  in  the  manner  specified,  but  the 
Interaction  failed  because 

a.  the  program  itself  has  an  error, 

b.  the  program  it  was  interacting  with  has  an  error, 

c.  the  data  it  was  interacting  with  have  an  error. 

• the  operating  system  does  not  operate  properly  because 

a.  it  has  resource  control  errors, 

b.  it  lias  memory  access  right  errors, 

c.  it  has  decision  errors, 

d.  it  has  time  dependent  synchronization  errors. 

In  this  paper,  we  will  present  some  of  the  available  methods  dealing 
with  these  types  of  errors  and  also  discuss  future  work  in  this  area. 


1.  USK  OK  SKLF -CTIKOKI NO  SOFTWARK . Many  sel f -checking  techniques  can  be 
incorporated  in  software  design  to  verify  the  correct  operation  of  t It  system 
during  execution.  Here,  we  will  discuss  some  of  the  sel f -checking  techniques. 
For  more  detailed  discussion,  the  reader  is  referred  to  [?1.  Basically,  all 
the  sel f -checking  techniques  can  be  classified  according  to  the  three  aspects 
o(  a process  that  the  technique  is  checking: 

• the  function  of  a process 

• the  control  sequence  of  a process 

• the  data  of  a process. 

1.1  Functional  Checking.  The  functional  aspects  of  a process  can  be 
checked  bv  verifying  the  reasonableness  of  the  output  for  a given  set  of  input. 
This  can  be  implemented  easily  if  the  function  of  the  process  is  well  defined 
mathematically.  For  example,  if  the  output  is  the  solution  of  a set  of  mathe- 
matical equations.  Its  correctness  can  be  verified  by  substituting  the  solution 
to  the  equations  and  checking  for  consistency.  In  some  cases,  there  mav  exist 
a simple  relationship  among  the  output  variables,  ami  the  correctness  of 
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the  output  can  be  verified  by  checking  this  relationship,  e.g.,  a sorting 
program.  Other  obviously  incorrect  results,  such  as  arithmetic  overflow, 
will  also  indicate  an  error.  However,  in  most  cases,  the  correctness  of  the 
output  can  only  be  verified  by  an  algorithm  which  is  just  as  complicated, 
and  therefore  unreliable,  as  the  program  to  be'  tested,  e.g.,  a compiler,  a 
scheduler,  etc.  In  these  cases,  a rigorous  check  on  the  correctness  of  the 
output  is  infeasible  and  we  can  only  check  the  reasonableness  of  the  output 
to  determine  its  reliability.  For  example,  if  x is  the  solution  of  an  opti- 
mization problem,  the  reasonableness  of  5T  can  be  validated  by  comparing 
the  value  of  the  objective  function  corresponding  to  x and  other  values  of 
x.  An  unreasonable  output  usually  indicated  the  presence  of  errors,  but 
not  vice  versa.  The  formulation  of  an  effective  set  of  reasonableness  checks 
is  a difficult  task.  A common  technique  used  is  to  check  the  consistency 
rather  than  the  correctness  of  the  output  as  the  reasonableness  criterion. 

For  example,  when  a record  corresponding  to  a key  is  located  by  a search,  we 
can  perform  the  reverse  translation  to  verify  that  the  record  will  produce 
that  key.  The  consistency  and  validity  of  the  input  variables  can  also  be 
checked  before  they  are  accepted.  It  is  then  hoped  that  no  abnormal  behavior 
of  the  program  will  be  resulted  when  the  input  variables  are  processed. 


3.2  Control  Sequence  Checking.  The  control  sequence  of  a process  refers 
to  the  sequence  of  computations  executed.  In  terms  of  the  program  graph,  this 
corresponds  to  a path  from  the  entry  node  to  the  exit  node  of  the  graph.  When 
each  node  represents  a well-defined  operation,  the  particular  seqvience  in  which 
these  operations  are  executed  will  determine  the  outcome  of  a process.  Any 
deviation  from  the  correct  execution  sequence  will  lead  to  an  unreliable  result. 
Such  an  error  in  the  control  sequence  is  called  a control  fault  and  it  can  be 
roughly  classified  into  one  of  the  following  groups; 

• an  infinite  loop  is  executed 

• a loop  is  executed  an  incorrect  number  of  times 

• an  illegal  branch  (non-existent  branch  in  the  program  graph)  is  taken 

• a wrong  branch  is  taken. 

When  the  control  sequence  of  a process  is  known  and  fixed,  it  is  rela- 
tively simple  to  monitor  the  execution  sequence  to  verify  its  correctness 
against  any  control  fault.  Infinite  loops  can  be  checked  by  defining  an 
upper  bound  on  the  maximum  number  of  times  a loop  can  be  executed.  When  the 
execution  sequence  is  not  fixed,  we  can  check  if  the  sequence  is  a valid 
(though  not  necessarily  correct)  execution  sequence  [A],  In  this  method, 
all  valid  control  sequences  can  be  derived  from  the  program  graph.  At 
the  entry  to  each  node,  a check  is  made  to  insure  that  the  sequence  up  to 
this  point  is  valid.  Illegal  branches  are  readily  checked  this  way.  An 
illegal  branch  can  also  be  checked  by  a scheme  called  relay-runner  [5  ]. 

In  this  scheme,  a "baton",  which  is  similar  to  a password,  is  carried  along 
with  the  transfer  of  control  and  checked  at  appropriate  check  points. 

When  an  illegal  branch  is  taken,  control  will  not  possess  the  valid  up-to-date 
baton  value  and  the  error  will  be  detected  at  the  next  check  point,  as  illus- 
trated in  Fig.  1.  Illegal  branches  can  also  be  partially  checked  by  defining 


legal  entry  points  as  the  only  valid  destinations  of  control  from  one  part  of 
the  program  to  another.  By  limiting  the  area  to  which  control  can  be  trans- 
ferred, a range  check  can  also  detect  Illegal  branches  partially.  Other  ille- 
gal branches  can  be  avoided  by  using  some  defensive  programming  techniques. 

There  are,  however,  few  effective  techniques  for  checking  wrong  branch- 
ing and  Incorrect  loop  termination.  When  a branch  condition  can  be  checked 
by  an  Independent  set  of  logical  variables,  some  wrong  branching  can  be 
detected.  In  the  control  fault  detection  method  developed  by  Kane  and  Yau  [4], 
wrong  branches  are  detected  by  placing  the  decision  logic  In  each  successor 
node  rather  than  the  conventional  practice  of  placing  the  decision  logic  in 
the  predecessor  node.  In  this  way,  each  control  decision  Is  checked  several 
times,  leading  to  multiple  simultaneous  execution  sequences  in  the  case  of 
an  error  In  a single  decision  computation.  When  the  number  of  times  a loop 
should  be  executed  can  be  calculated  from  the  value  of  some  variables,  a 
consistency  check  can  be  performed  on  this  number  with  the  termination 
condition  of  the  loop. 


3.3  Data  Checking.  Although  data  usually  refer  to  the  information 
Items  to  be  processed  by  the  program,  they  may  be  generalized  to  include 
all  information  stored  within  the  computer  system.  In  a s tored -program 
control  system, these  Include  both  program  ins' ructions  and  data.  The  mutila- 
tion of  such  Information  stored  In  the  computer  can  be  caused  by  residual 
software  errors  as  well  as  hardware  errors.  Checking  can  be  performed  on 
the  following  aspects  of  the  data: 

• the  integrity  of  data  value 

• the  integrity  of  data  structure 

• the  nature  of  data  value. 

The  Integrity  of  data  value  can  be  protected  easily  by  maintaining  a check 
sum  of  the  content  of  a piece  of  code  (instruction  and  data).  The  check  sum 
can  be  checked  periodically  or  at  the  time  before  a piece  of  code  is  executed 
to  validate  Its  integrity  [5,6].  It  is  Important  to  recompute  the  check 
sum  If  some  legitimate  modifications  are  made  on  the  data  during  the  execution. 
A simple  check  sum  Is  the  modular  sum  of  the  data  values,  as  shown  In  Fig.  2. 
The  Integrity  of  state  data  stored  in  the  system  tables  can  also  be  checked 
by  verifying  their  consistency  with  the  actual  states  of  the  resources. 
Error-detecting  codes  can  be  employed  to  code  the  state  of  flags. 

There  are  a number  of  techniques  to  check  the  integrity  of  the  data  struc- 
tures, especially  for  linked  lists.  We  can  add  an  additional  pointer  to  the 
end  of  the  linked  list  and  periodically  trace  the  list  to  verify  that  the 
list  is  Intact  [7 ].  Redundant  state  Information  of  the  Items  can  be  kept  to 
verify  that  Items  linked  together  satisfy  the  condition  corresponding  to 
the  list  [8].  Redundant  linking  can  also  be  performed  by  using  a bi-direc- 
tional linked  list  place  of  a uni -directional  linked  list  [6].  When  an 
item  Is  located,  inserted,  or  deleted,  a check  can  be  made  on  the  link  In 
the  opposite  direction  to  verify  Its  Integrity  after  the  corresponding 
operation. 


The  correct  operation  of  the  system  can  also  be  checked  by  monitoring 
the  behavior  of  critical  data  variables  during  execution  time.  Data  filter 
can  be  inserted  to  check  that  a piece  of  data  does  not  fall  outside  its 
maximum  and  minimum  value  [9],  as  shown  in  Fig.  3.  The  mean  and  variance 
of  important  parameters  can  also  be  measured  during  execution  time  to  detect 
abnormal  behavior.  Assertions  on  the  relationship  between  variables  can 
be  inserted  at  appropriated  check-points  of  the  program  to  detect  incorrect 
computations.  This  technique  is  especially  effective  if  a computation  can 
be  divided  into  stages  and  intermediate  results  can  be  checked  easily. 

The  assertions  formulated  will  be  similar  to  those  used  in  proving  program 
correctness. 

Sel f-checking  techniques  can  be  used  at  different  levels;  system  level, 
program  level,  module  level,  instruction  level,  and  data  level.  Software 
redundancy  at  different  levels  is  employed  to  check  the  system  at  that  level. 
The  lower  is  the  level,  the  narrower  is  the  scope  or  error  detection,  but  the 
finer  is  the  error  resolution.  They  can  also  be  implemented  at  different 
levels  of  languages,  such  as  design  languages,  high  level  languages,  machine 
languages  and  microprograms.  However,  these  techniques  are  of  ad  hoc  nature 
and  their  error  detection  and  correction  capabilities  vary  greatly,  so  are 
the  cost  of  implementing  and  using  them.  In  order  to  use  them  effectively 
in  error-resistant  software  design,  overall  consideration  must  be  given  to 
the  entire  systems  design.  In  the  following  sections,  we  are  going  to  discuss 
some  of  such  system  design  techniques. 


4.  SYSTEM  DESIGN  OF  ERROR- RESISTANT  SOFTWARE.  There  have  been  many 
methods  dealing  with  some  specific  aspect  of  error-resistant  software  system 
design,  such  as  operating  systems  and  system  structures.  Most  of  these  tech- 
niques deal  with  the  design  of  error-resistant  operating  systems.  Because 
two  survey  papers  have  appeared  recently  [10,11]  our  discussions  here  will 
emphasize  the  system  structures  for  error-resistance  software. 

The  first  comprehensive  system  structure  we  would  like  to  discuss 
was  proposed  by  Randell  [12],  This  structure  is  based  on  recovery  block 
scheme  of  Horning  et_  a_l.  [13  ].  Basically,  a recovery  block  consists  of 
a conventional  block  with  error  detection  capability  (an  acceptance  test) 
and  additional  alternates.  A possible  format  for  simple  recovery  blocks  is  as 
follows; 


ENSURE 

BY 

ELSE  BY 


(acceptance  tost) 
(primary  alternate) 
(2nd  alternate) 


ELSE  BY  (Nth  alternate) 
ELSE  ERROR 


The  primary  alternate  corresponds  exactly  to  the  block  of  the  equivalent  con- 
vention program,  and  is  probably  the  most  efficient  one  to  perform  the  desired 
function.  The  acceptance  test,  which  is  a logical  expression  without  side  effects, 
is  evaluated  on  exit  from  any  alternate  to  determine  whether  the  alternate  has 
performed  satisfactorily.  A further  alternate,  if  one  exists,  is  entered  if 
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the  preceding  alternate  falls  to  complete  or  falls  the  acceptance  test.  Before 
an  alternate  Is  so  entered,  the  state  of  the  process  is  restored  to  that  just 
before  entry  to  theprimary  alternate.  If  the  acceptance  test  is  passed,  fur- 
ther alternates  are  Ignored,  and  the  statement  following  the  recovery  block 
is  the  next  to  be  executed.  However,  If  the  last  alternate  fails,  then  the 
entire  recovery  block  is  regarded  as  having  failed  so  that  the  block  in  which 
it  is  embedded  fails  to  complete  and  recovery  should  be  performed  at  that 
level.  It  should  be  pointed  out  chat  nested  recovery  blocks  are  often  used. 

In  using  the  recovery  block  scheme  to  design  error-resistant  software, 
we  have  to  consider  the  following  items.  First,  we  need  effective  acceptance 
tests  for  the  individual  alternates.  Although  some  of  the  self-checking 
techniques  discussed  before  among  others  [14],  may  be  used  as  acceptance 
tests,  methods  for  generating  simple  and  powerful  tests  must  be  further 
developed  in  order  to  make  this  approach  practical.  This  is  probably  the  most 
crucial  part  for  further  development  of  this  approach.  Secondly,  regarding 
various  alternates,  the  primary  one  is  the  most  efficient  one  to  perform  the 
function,  and  other  alternates  might  perform  the  same  function  in  some 
different  manner,  presumably  less  economically  (one  source  of  the  other 
alternates  may  be  earlier  releases  of  the  primary  alternate).  Thirdly,  the 
automatic  restoring  of  the  system  state  is  necessary  to  simplify  error 
recovery.  Whenever  a process  has  to  be  backed  up,  the  state  that  it  had 
reached  just  before  entry  to  the  primary  alternate  must  be  saved.  Therefore, 
the  only  values  that  have  to  be  reset  are  those  of  nonlocal  variables  that 
have  been  modified.  Since  no  explicit  restart  information  is  given,  it  is 
not  known  beforehand  which  nonlocal  variables  should  be  saved.  Various 
versions  of  a mechanism  which  arranges  that  nonlocal  variables  are  saved  in  the 
so-called  "recursive  cache"  have  been  developed.  This  is  accomplished  by 
detecting,  at  execution  time,  assignments  to  nonlocal  variables,  and  in 
particular,  by  recognizing  when  an  assignment  to  a nonlocal  variable  is  the 
first  to  have  been  made  to  that  variable  within  the  current  alternate. 

Thus,  precisely  sufficient  information  can  be  preserved.  Finally,  error 
recovery  among  interacting  processes  must  be  carefully  considered.  When  a 
process  is  implemented  by  a series  of  recovery  blocks,  the  point  to  which  the 
process  must  roll  back  in  case  of  a error  is  not  necessarily  the  starting 
point  of  the  recovery  block  containing  such  an  error,  if  this  process  inter- 
acts with  other  processes.  An  example  is  illustrated  in  Fig.  4,  where  each 
of  the  three  interacting  processes  has  a series  of  four  recovery  blocks. 

Should  Process  A fail  at  the  end,  it  will  be  backed  up  to  its  latest,  the 
fourth  recovery  point,  but  the  other  processes  will  not  be  affected.  If  Process 
B fails  at  the  end,  it  will  be  backed  up  to  the  third  recovery  point  of 
Process  A.  However,  if  Process  C fails  at  the  end,  it  has  to  be  backed  up 
all  the  way  to  the  starting  point  of  the  three  processes.  This  type  of 
domino  effect  can  occur  when  two  particular  circumstances  exist  in  combination. 

• The  recovery  block  structures  of  the  various  processes  are  uncoordinated, 
and  take  no  account  of  process  interdependencies  caused  by  their  interac- 
tions. 

• The  processes  are  symmetrical  with  respect  to  error  propagation  - 
either  of  two  Interacting  processes  can  cause  the  other  to  back  up. 
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By  removing  either  of  these  circumstances,  the  danger  of  the  domino  effect 
can  be  avoided.  To  avoid  the  first  circumstances,  process  interactions 
should  be  properly  structured.  To  avoid  the  second  circumstance,  the  con- 
cept of  multilevel  processes  can  be  used.  For  more  detailed  discussion  of 
these  results,  the  reader  is  referred  to  [12].  A further  study  on  the 
coordination  of  recovery  interacting  processes  has  been  made  by  Kim  [15], 

Recognizing  the  fact  that  a very  large  portion  of  the  errors  normally 
occurs  in  application  programs, and  the  fact  that  error-resistant  operating 
systems  may  be  designed  using  different  methods,  Yau,  Cheung  and  Cochrane  [l] 
proposed  a error-resistant  software  system  structure  with  the  emphasis  on 
dealing  with  errors  in  the  execution  of  applications  programs.  This  approach 
uses  the  module  level  as  the  basic  level  of  software  error  detection  and 
recovery,  and  adopts  the  concept  of  the  recovery  block.  The  main  distinction 
of  this  approach  is  that  in  order  to  protect  the  integrity  of  the  error 
detection  and  recovery  code  against  possible  damage  caused  by  the  occurrence 
of  errors,  they  are  executed  in  a differene  protection  domain  from  that  of 
application  program  modules.  All  the  reliability  codes  are  separated  from 
the  programs  and  placed  under  the  Jurisdiction  of  a piece  of  software,  called 
the  System  Monitor.  The  System  Monitor  will  then  be  responsible  for  moni- 
toring the  application  programs  to  make  sure  that  they  are  executed  reliably. 

In  a system  with  a hierarchical  protection  structure  such  as  Multies  [16], 
it  will  operate  at  a protection  level  between  the  level  of  the  operating 
system  and  the  application  programs.  It  contains  so...  user  defined  code, 
but  also  shares  some  of  the  capabilities  of  the  operating  eastern,  such  as 
communication  with  other  operating  processes  and  handling  of  errors  detected 
by  hardware.  During  execution, the  System  Monitor  has  the  responsibility 
of  ensuring  that  all  interactions  that  a module  has  with  other  objects  in 
the  system  are  reliable.  As  each  module  in  a program  ends  execution,  regard- 
less of  whether  it  is  because  the  module  finished  its  processing  or  because 
the  hardware  detected  an  error,  it  will  return  control  to  the  System  Monitor. 

If  the  processing  was  completed,  the  System  Monitor  will  then  evaluate  the 
functional  reliability  of  the  module's  execution.  If  the  module  performs  the 
specified  function  with  acceptable  performence,  it  will  determine  which  module 
should  be  given  control  next  and  also  what  data  values  should  be  passed 
to  the  module's  successors.  It  will  then  give  control  to  the  selected 
module.  If  an  error  is  detected  at  any  point,  the  System  Monitor  will  initiate 
recovery  procedures.  This  scheme  guarantees  that  the  reliability  code  will 
always  be  executed,  and  furthermore,  it  ensures  that  it  will  be  executed  in 
a reliable  environment. 

The  internal  structure  of  the  System  Monitor  is  shown  in  Fig.  5.  There 
are  five  types  of  components  in  the  System  Monitor:  the  Internal  Process 
Supervisors  (IPS),  the  External  Process  Supervisors  (EPS),  the  Interaction 
Supervisor  (IS),  the  System  Monitor  Kernel  (SMK) , and  the  Maintenance  Program 
(MP) . The  term  Internal  Process  here  refers  to  the  execution  of  a program, 
and  the  term  External  Process  refers  to  the  access  of  global  data  shared 
among  modules  or  programs. 

There  is  one  IPS  for  each  internal  process.  It  is  responsible  for 
checking  whether  the  execution  of  its  associated  program  is  reliable.  Its 
components  are  the  Program  Supervisor  and  the  Module  Supervisor.  Thera 
is  one  Program  Supervisor  per  program  and  one  Module  Supervisor  per  siodula. 


The  Program  Supervisor  is  responsible  for  monitoring  the  flow  of  control 
and  data  between  modules.  The  Module  Supervisor  is  responsible  for  check- 
ing the  functional  reliability  of  the  modules.  Each  module  performs  a 
function  on  the  input  parameters.  Each  module  in  a user's  program  may 
have  the  format  as  shown  in  Fig.  6.  It  is  essentially  a simple  recovery 
block,  except  that  it  has  both  the  entrance  block  containing  validation  tests 
to  determine  if  the  input  data  satisfying  the  specifications,  and  the  accep- 
tance block  containing  acceptance  tests  to  determine  if  the  module  has 
successfully  processed  the  input  data  and  to  validate  the  output  before  their 
values  are  stored  back  to  the  global  data  structures.  However,  both  the 
entrace  block  and  acceptance  block  are  stored  in  the  Module  Supervisor,  and 
only  the  functional  block  which  contains  the  alternate  versions  performing 
the  processing  function  is  actually  considered  as  in  the  module  of  the  internal 
process  shown  in  Fig.  5.  Note  that  a module  acts  like  an  input-output  trans- 
former and  all  global  data  shared  among  modules  have  to  be  passed  as  input 
and  output  parameters. 

There  is  one  EPS  created  by  the  programmer  for  each  global  data  structure. 

It  is  responsible  for  checking  whether  reliable  data  are  available  to  the 
program  modules.  A global  data  structure  may  be  shared  among  modules  of 
a program  or  among  several  concurrent  processes  corresponding  to  different 
programs.  The  EPS  contains  the  error  detection  routines  as  well  as  the 
information  and  code  needed  for  recovery.  In  addition,  the  EPS  may  also 
include  facilities  that  support  abstract  data  types,  such  as  operation  clusters 
[17].  If  an  error  is  detected  by  the  EPS,  it  may  retry  the  access  again  from 
the  storage  device;  it  may  use  some  error  correcting  codes  in  conjunction 
with  the  data  store;  or  it  may  attempt  to  repair  the  damaged  data  by  recon- 
structing them  from  redundant  information  stored  by  the  EPS  or  from  a safe 
backup  copy.  The  EPS's  of  the  data  used  by  a program  must  cooperate  with 
the  IPS  of  that  program  during  its  recovery.  For  data  structures  shared  by 
several  concurrent  internal  processes,  the  synchronization  requirements 
will  be  enforced  by  the  next  component  of  the  System  Monitor,  the  Interaction 
Supervisor. 

The  Interaction  Supervisor  (IS)  is  a portion  of  the  operating  system 
which  is  responsible  for  ensuring  the  reliability  of  interactions  among  processes. 
When  an  internal  process  attempts  to  interact  with  an  object  (another  internal 
process  or  external  process)  in  its  external  environment,  it  must  do  so  via 
the  IS.  The  IS  will  first  determine  whether  the  process  is  authorized  to 
engage  in  the  interaction.  If  the  interaction  is  permissable,  the  IS  will 
provide  the  facilities  required  for  the  interaction,  such  as  process  syn- 
chronization, message  buffers,  and  then  supervise  the  interaction  to  make 
sure  that  it  is  completed  reliably.  The  IS  may  be  implemented  as  an  opera- 
ting system  monitor  [18,19].  In  the  event  that  an  interacting  process  fails, 
the  IS  will  coordinate  recovery  among  the  processes  that  the  faulty  process 
has  interacted  with. 

If  the  System  Monitor  is  to  be  able  to  guarantee  the  reliable  operation 
of  the  application  software,  it  must  have  some  capability  of  ensuring  its 
own  integrity.  The  System  Monitor  has  a component  in  the  operating  system, 
called  the  System  Monitor  Kernel,  which  has  the  responsibility  of  checking 
the  integrity  of  the  Process  Monitor.  There  will  be  a Maintenance  Program 
stored  in  the  operating  system  portion  of  the  System  Monitor.  It  will  be 
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run  during  periods  of  light  processing  loads  to  attempt  to  "repair"  faulty 
processes  in  the  system. 

The  detailed  description  of  the  operation  of  the  System  Monitor  and 
handling  of  various  errors  is  given  in  [l],  The  practicality  of  this 
approach,  similar  to  that  proposed  by  Randell  [12],  depends  largely  on 
the  development  of  effective  entrance  tests  and  acceptance  tests,  which 
depends  strongly  on  the  application  (function  of  the  program)  and  modular- 
ization. Various  schemes  for  implementing  the  individual  components  of  the 
System  Monitor  must  be  developed. 


5.  SPECIFICATION  OF  ERROR- RESISTANT  SOFTWARE.  The  development  of  a large- 
scale  software  system  starts  from  the  analysis  of  the  user's  requirements,  and 
the  results  of  this  analysis  will  produce  some  requirement  documents  [20, 2l]. 

The  requirements  of  a system,  including  functional,  performance,  and  cost  re- 
quirements, can  be  stated  in  a more  formal  format,  which  can  be  used  as  a set 
of  guidelines  during  the  design  stage,  such  as  a blueprint  for  deriving  the 
system  specif ication.  It  is  well  known  that  the  high  quality  of  the  speci- 
fication of  the  software  system  is  the  key  to  a successful  design  [22], 

The  specification  of  a software  system  consists  of  the  control  structure, 
which  includes  the  data  flow,  and  data  types.  Following  certain  guidelines, 
such  as  those  suggested  by  Pamas  [23],  the  system  can  be  decomposed  into 
subsystems  and  modules,  and  then  the  control  structure  of  the  system  can 
be  established.  Although  some  specification  techniques  of  the  control 
structure  have  been  developed  [24,25],  little  work  in  the  area  of  specification 
of  data  types  has  been  done  for  error-resistent  software  design. 

The  concept  of  abstract  data  types  was  originally  developed  for  high 
level  programming  languages, the  word  abstract  indicates  that  it  is  implemen- 
tation independent.  The  main  purpose  of  data  abstract  in  a programming 
language  is  to  support  the  capability  for  constructing  a more  reliable 
program.  An  abstract  data  type  consists  of  a set  of  data  objects,  a set  of 
operations  that  acts  upon  the  objects,  and  a set  of  descriptions  that  charac- 
terizes the  operations.  For  example,  a stack  with  elements  being  integers 
and  five  stack  operations,  namely,  POP,  TOP,  PUSH,  NEWSTACK,  and  ISNEWSTACK 
can  be  specified,  using  algebraic  specification  approach  [26],  as  follows: 

TYPE  Stack  [item] 

DECLARE  NEWSTACK  "*  Stack 

PUSH  (Stack, item)  Stack 
POP  (Stack)  -*  Stack 
TOP  (Stack)  ■*  item 
ISNEWSTACK  (Stack)  -*  Boolean 
FOR  ALL  s € Stack,  i 6 item  LET 
ISNEWSTACK  (NEWSTACK)  - true 
ISNEWSTACK  (PUSH  (s,l))  - false 
POP  (NEWSTACK)  - NEWSTACK 
POP  (PUSH  (s,l))  - a 
TOP  (NEWSTACK)  - UNDEFINED 
TOP (PUSH  (s,l))  - 1 

END 

END  STACK 
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In  this  example,  it  only  gives  an  implicit  definition  on  the  possible  values 
of  data  objects,  i.e.  the  stack  itself,  integers,  the  definitions  of  opera- 
tions and  their  interactive  behavior.  However,  the  details  on  how  these 
data  objects  and  operations  are  implemented  need  not  be  Considered  at  the 
design  stage,  and  hence  the  designer  will  have  more  flexibilitv  on  develop- 
ing the  correct  structure  of  the  program  during  the  early  part  of  the  design 
stage.  Such  a facility  in  programming  language  will  give  the  designer 
more  convenience  and  control  over  his  development  of  the  program,  and  also 
more  confidence  in  the  reliability  of  the  final  software  product  because 
this  idea  is  consistent  with  the  concept  of  step-wise  refinement.  For 
the  development  of  large-scale  software  systems,  the  same  strategy  can  also 
be  adopted  in  the  design  methodology  to  reduce  the  development  cost  and 
still  retain  the  reliability  of  the  system.  Moreover,  the  abstract  data  types 
can  be  treated  as  system  resources,  such  as  disks  and  other  I/O  devices; 
and  be  controlled  and  protected  under  some  mechanism,  so  that  the  overall 
protection  problem  in  the  developing  cycle  can  be  simplified. 

Although  the  concept  of  abstract  data  types  has  been  applied  to  the 
development  of  software  systems,  as  in  HOS  [25],  the  lack  of  a formal  speci- 
fication technique  of  abstract  data  types  makes  the  concept  still  not  an 
intrinsic  one  to  the  design  methodology  [27].  Here,  the  formality  of  the 
specification  techniques  means  the  specification  can  be  constructed  follow- 
ing certain  rigorous  disciplines  and  the  resultant  specification  can  be 
proved  correct.  Some  work  has  been  done  in  this  area  [26,28,29],  but  the 
results  have  to  be  made  more  practical  for  the  designer  to  use  [29]  and 
effective  methods  for  checking  the  correctness  of  the  abstract  data  type 
specification  need  to  be  developed  [26,27].  Furthermore,  the  effects 
of  possible  errors  that  may  be  associated  with  the  abstract  data  types 
must  also  be  studied.  For  example,  requesting  the  top  element  of  a stack 
will  be  reasonable  only  if  the  stack  is  not  empty;  otherwise,  it  will  be 
an  undefined  condition  as  shown  in  the  previous  example.  Although  we 
need  not  to  know  the  details  on  how  to  recover  from  error  conditions  during 
the  design  stage, they  will  become  more  clear  during  the  implementation  stage 
if  there  are  some  error  detection  and  processingmechaniBms  in  the  specifica- 
tion. For  example,  in  the  previous  example  of  stack  before  executing  the 
operation  POP,  a checking  on  the  emptiness  of  the  stack  will  avoid  an 
illegal  execution.  With  some  modifications  of  the  specification,  the 
example  we  just  mentioned  can  be  specified  as  follows: 

TYPE:  Stack  [n.item] 

ASSERTIONS:  n 5 100 

OPERATIONS: 


POP: 

IF  (n  .EQ.  0)  THEN  ERROR  1 
ELSE  s = POP  (s,n-l) 


BLOCK  ERROR  1 
BEGIN 

MESSAGE  */Stack  is  empty/*; 

END 

END  ERROR  1; 
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Here,  n represents  the  number  of  elements  in  the  stack  s,  and  we  use 
an  assertion  to  limit  it  to  at  most  100  elements.  This  is  just  the  same 
as  performing  an  input  validation  test  on  the  data  object  n as  suggested 
in  [I]. 

In  order  to  have  an  effective  error-resistant  software  design  methodology 
it  is  necessary  to  have  an  integrated  specification  technique  that  can  unam- 
biguously and  clearly  specify  the  control  structure  as  well  as  the  desired 
design  data  type.  The  error-resistant  software  specification  generated 
should  be  proved  to  be  correct.  Features  such  as  error  detection  and 
processing  should  be  an  integral  part  of  the  specification  technique  to 
guarantee  the  completeness  and  correctness  of  the  specification.  As 
discussed  before,  much  work  must  be  done  in  this  area. 


6*  CONCLUSIONS  AND  FUTURE  WORK.  Error-resistant  software  can  provide 
ultra  reliability  and  continuous  availability  of  computing  resources,  which 
are  vital  in  many  critical  real-time  applications.  From  the  discussion  of 
this  paper,  we  can  see  that  substantial  progress  has  been  made  in  this  area 
during  the  past  several  years.  However,  in  order  to  have  a complete  and 
cost-effective  methodology  for  error-resistant  software  design,  much  further 
research  and  development  have  to  be  done  in  this  area.  Many  specific  prob- 
lems that  need  to  be  solved,  ranging  from  system  requirements,  specifica- 
tions, structures,  and  testing  (checking-),  have  also  been  discussed  in  this 
paper.  It  is  our  hope  that  this  paper  will  stimulate  the  interest  of  more 
researchers  to  work  in  this  important  area  so  that  cost-effective  metho- 
dologies for  large-scale  error-resistant  software  design  can  soon  be 
developed. 
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Fig.  1.  An  example  to  illustrate  the  relay  runner  scheme 
(for  detecting  illegal  branches  in  a control 
structure  during  execution). 


Fig.  2.  A simple  checksum  scheme  for  checking  the  integrity  of 
data  value. 


An  upper  bound  XMAX  and  a lower  bound  XM1N  for  the  value 
of  a variable  X are  specified  by  the  programmer.  The 
following  statements  are  inserted  after  each  assignment  of  X: 

X » ... 

IF  (X.GT  XMAX)  CALL  ERROR  (X) 

IF  (X.LT.XM1N)  CALL  ERROR  (X) 

Fig.  3.  A data  filter  for  checking  the  range  of  values  of  a data 
variable . 
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Fig.  6.  A user  program  module  used  in  the  error-resistant  software 
design  given  in  [l]. 
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ABSTRACT.  A new  synthesis  of  correctness  methodology  is  needed  to 
provide  reasonable  explanations  of  even  simple  programs  as  illustrated 
by  example.  The  example  suggests  that  current  approaches  to  correctness 
must  be  modified  to  present  understandable  and  convincing  arguments  that 
a program  accomplishes  its  intended  objectives. 

Our  approach,  as  illustrated  by  example,  defines  a class  of  primary 
program  descriptions.  A primary  program  description  is  to  meet  the 
following  criteria: 

a.  It  is  understood  by  the  intended  audience. 

b.  It  is  algorithmically  transferred  to  the  object  program. 

c.  Its  correctness  is  convincing  to  the  intended  audience. 

In  the  present  paper  we  focus  on  some  linguistic  issues  in 
primary  program  descriptions  in  so-called  pidgin-ALGOL  programs. 

1.  INTRODUCTION.  The  software  problem  is  now,  and  for  the  foresee- 
able future  will  be,  one  or  tne  major  technical  problems  with  which  we 
will  have  to  contend.  The  scope  of  the  problem  has  been  documented  by 
Boehm  [1]: 

"The  annual  cost  of  software  in  the  U.S.  is  approximately  20 
billion  dollars.  Its  rate  of  growth  is  considerably  greater  than  that 

of  the  economy  in  general  A recent  study  by  Fisher  [2]  found  that 

DoD's  software  expenditure  during  FY  1973  was  roughly  $2. 9-3. 6 billion." 

The  following  data  of  Brandon  [3]  accentuate  the  gravity  of  the 
problem: 
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1976 


1980 


1986 


Total  expenditure  on  computing* 


$ 35  billion 


$90  billion 


$139  billion 


% of  GNP  accounted  for  by  the  3%  6%  7% 

Computing  Industry 

♦Includes  hardware,  programming,  supplies,  etc:  as  a comparison,  the  1975 
figure  for  car  sales  in  the  U.S.A.  was  $38  billion." 


Clearly  with  a problem  of  such  magnitude,  an  extensive  scientific 
endeavor  at  different  levels  is  appropriate  to  try  to  control  the  problem. 
Indeed,  a variety  of  approaches  is  being  tried,  and  we  shall  briefly 
survey  these  in  Section  2. 

In  Section  3,  we  give  an  example  of  a relatively  simple  program  which 
illustrates,  in  miniature,  the  software  problem.  This  program  is  much 
smaller  and  less  complex  than  typical  operational  programs,  and  thus  allows 
us  to  deal  with  it  in  this  paper.  Still  even  this  miniature  program  is 
quite  difficult  enough. 

Section  4 starts  with  an  analysis  of  the  difficulties  of  programming, 
as  exemplified  in  Section  3,  from  both  the  theoretical  and  practical  points 
of  view.  We  then  describe  some  of  the  criteria  for  a solution  to  the  problem, 
in  what  we  call  a primary  program  description. 

2.  APPROACHES  TO  THE  SOFTWARE  PROBLEM.  The  literature  on  approaches  to 
the  software  problem  is  so  extensive  already  that  a complete  bibliography  or 
survey  would  be  an  encyclopedic  work.  Even  an  incomplete  and  selective 
annotated  bibliography  in  1975  [4]  contained  over  300  items,  approximately 
3/4  of  these  from  the  1972-1974  period.  Further,  the  rate  at  which  articles 
and  books  on  the  software  problem  appear  seems  to  still  be  increasing  with 
new  journals  and  conferences. 

In  view  of  the  vast  literature,  we  shall  only  try  to  give  a brief 
sampler  of  some  of  the  major  approaches  to  the  software  problem.  For  our 
purposes,  we  classify  the  approaches  in  four  categories:  management 
approaches,  formal  correctness,  mechanized  verification,  and  documentation. 
This  certainly  does  not  exhaust  the  possibilities.  Table  1 contains  a 
listing  of  the  headings  under  which  papers  at  a recent  software  engineering 
conference  [5]  were  organized  and  is  instructive.  (This  conference  alone 
had  over  100  papers).  Many  of  the  approaches  described  in  the  literature 
combine  two  or  more  of  our  four  categories.  Still  others  concentrate 
either  on  a particular  class  of  software,  or  the  requirements  of  a specific 
group  of  users. 


The  first  of  our  four  categories  is  management.  Here,  two  approaches 
In  particular,  chief  programmer  team  [6]  and  egoless  programming  [7]  indi- 
cate the  diversity.  Essentially,  the  chief  programmer  team  concept  can  be 
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Table  1.  Categories  of  Papers  in  the  Software  Engineering  Conference 


Requirements  Definition 
Program  Synthesis  Techniques 
Operating  Systems 
Requirements  Engineering 
Education 

Operating  Systems  and  Networks 
Performance  Evaluation 
Programmer's  Workbench 
Software  Design  and  Development 
Design  of  Large  Programs 
Programming  Languages  and  Systems 
Software  Design 

Software  Engineering  in  the  Department  of  Defense 

Software  Verification  and  Validation 

Program  Proving  and  Verification 

Theoretical  Aspects  of  Software  Engineering 

Software  Fault  Tolerance 

Validation  and  Testing 

Data  Bases 

Case  Studies 

Software  Automated  Tools 
Programming  Languages 
Software  Modeling 

Design  Specification  and  Management 
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understood  as  structurally  similar  to  a surgical  team.  The  chief 
programmer  is  aware  of  and  responsible  for  all  of  the  programming  pro- 
cess, although  some  of  the  details  are  delegated  to  other  members  of 
the  team.  In  the  egoless  programming  approach,  a less  hierarchical 
management  structure  is  postulated,  ana  the  emphasis  is  placed  on  infor- 
mation exchange  among  programmers. 

Another  category  is  formal  correctness  which  has  ben  especially 
emphasized  in  research  based  approaches.  Here  the  papers  which  marked 
the  beginning  of  the  major  activity  in  proving  correctness  were  Floyd's 
[8]  and  a variant  later  introduced  by  Hoare  [9].  The  basic  idea  in 
these  approaches  is  that  a specification  of  the  program  requirement  is 
written  in  a formal  language,  typically  a first  order  logic,  and  one 
then  attempts  to  prove,  using  the  deduction  rules  of  the  formal  system, 
that  the  program  satisfies  the  specification. 

A variant  of  this  approach  is  developed  by  Dijkstra  [ 1 0] . In 
Dijkstra's  method,  a specification  of  the  program  is  written  in  a formal 
language,  and  the  program  is  developed  from  the  specification. 

A number  of  experimental  implementations  of  programs  to  perform 
verification  have  been  developed.  King  till  was  tne  earliest  of  these, 
implementing  Floyd's  method  for  a (decideable)  subset  of  Algol.  Other 
systems  have  concentrated  on  other  languages  with  a different  style; 
e.g.  Boyer  and  Moore  1121  on  LISP.  Experimental  systems  intended  for 
program  development,  rather  than  verification,  have  also  been  described, 
such  as  Burstal I and  Darlington's  interactive  system  [131. 

The  approaches  based  on  formal  logic  have  been  described  in  a number  of 
textbooks,  of  which  the  most  widely  used  is  Manna  [14].  More  recently.  Manna 
ana  Waldinger  [15]  and  London  [16]  have  surveyed  both  the  theory  of  proofs 
that  programs  meet  their  specifications,  and  experimental  systems. 

The  logic  based  approach  has  also  been  the  suoject  of  some  controversy. 
Demillo,  Lipton,  and  Perlis  1171  and  Dijkstra  1181  is  a lively  exchange  on  the 
subject. 

Finally,  in  the  area  of  documentation,  Teicherow  and  Hershey  [19]  is 
perhaps  the  best  known.  PSL/PSA  (problem  statement  language/problems 
statement  analysis)  uses  the  computer  to  store  and  analyze  requirements 
(this  is  presumably  synonymous  with  specification)  and  assists  in  pre- 
paring documentation.  A less  ambitious  documentation  program  tnat  is 
also  well-known  is  Mills  [211.  Mills  starts  with  the  program  text,  and 
uses  an  interactive  system  to  develop  documentation  to  annotate  the  pro- 
gram. 


In  the  next  section  we  will  discuss  a "toy"  program  and  return  to 
consider,  in  the  light  of  this,  what  should  be  the  nature  of  a solution 
to  the  software  problem. 


24 


■ 


t 


3.  EXAMPLE.  The  program  in  this  section  is  not  large,  but  it  is 
also  not  easy  to  understand.  We  will  examine  this  program  in  detail  and 
then  attempt  a presentation  of  this  program  so  that  its  design  is  displayed. 

The  example  is  a program  for  inverting  a permutation  without  using 
an  additional  array.  This  program  appeared  in  1964,  but  was  revised  and 
simplified  to  appear  again  the  next  year.  Since  1965  we  have  seen  other 
presentations  of  this  program  (Burstall,  [22],  Knuth  [23 ] ) , but  they  are 
not  unlike  the  1965  version  which  is  shown: 
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For  m:  = n step  -1  until  1 do 
begin  i : = A |m| ; 

i f i < o then  A |m] : = -i 
ejse  if  i f m then 
begin  k:  = m; 

while  i i m do 

begin  j:  = A [i  ] ; A [i  ] : = -k; 
k : = i ; i : = j 

end; 

A [m  j : = k ; 
end 
end 


Program  1:  A program  that  inverts  a permutation.  A,  in  situ. 


Before  examining  this  program  we  must  agree  on  some  definitions. 

A permutation  is  an  arrangement  of  n distinct  objects  in  a row.  If  the 


objects  are  the  integers  {1,2,  ....  n}  and  A f 1 1 A 
permutation,  then  the  inverse  permutation  B [1]  B [2 
rearrangement  that  undoes  the  effect  of  A.  Thus  if 
then  j goes  to  i under  B. 


2] 


. A 
B [n 
goes  to 


]nl  is  a 
’ 's  the 
under  A 


II 
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Every  permutation  consists  of  one  or  more  disjoint  cycles.  An 
example  of  a permutation  along  with  Its  cycle  structure  is: 

1:  1234567 

A [i  1 : 4 2 5 7 3 1 6 


4 — *7 

| j 2)  3^5. 

1< 6 


The  co.  ,‘esponding  inverse  permutation,  B,  and  its  cycle  structure  is: 

1 : 1 2 3 4 5 6 7 4- 7 

B fi|:  6 2 5 1 3 7 4 


The  cycle  structure  of  B is  similar  to  that  of  A except  that  the  arrows 
have  reversed  directions. 

There  is  a very  simple  method  for  computing  the  Inverse  of  A:  set 
B [A  Li  J J : = 1 for  1 _<  1 < n.  Then  B [l  J B L2  J ...  B[n]  is  the  desired 
inverse.  This  method  uses  2 n memory  cells,  n for  each  array. 

The  concern  for  memory  storage  requirements  can  be  reduced  by 
computing  the  inverse  without  using  an  additional  array,  so  that  when 
the  program  is  finished  A [1 J A [2J  . . . A [n]  Is  the  inverse  of  the 
original  permutation. 

An  outline  of  the  algorithm  is: 

For  each  m between  1 and  n do 

if  cycle  beginning  at  m has  not  been  Inverted  then 

Invert  and  mark  each  element  of  the  cycle  (except  for  m) 

else 

remove  the  marker  from  this  element  (m). 

This  algorithm  inverts  one  cycle  at  a time,  marking  each  element  (with  a 
negative  sign  (-)).  When  the  algorithm  later  encounters  one  element  with 
a marker,  the  marker  Is  removed. 

Program  1 has  few  variables.  The  idea  of  the  algorithm  is  simple, 
but  the  program  causes  us  difficulty.  How  do  we  determine  correctness 
for  this  program?  How  can  we  retain  the  link  between  the  algorithm  and 
the  program  so  that  when  the  program  Is  written  the  design  is  not  thrown 
away? 


The  communication  medium  between  the  programmer  and  the  computer  is 
many  times  the  source  of  the  problem.  When  we  must  contort  our  ideas  to 
fit  the  syntax  of  a particular  programming  language,  the  idea  is  sometimes 
lost.  Luckily  we  are  beginning  to  realize  that  a programs'  purpose  is  not 
to  instruct  the  computer  but  that  the  computer  is  there  to  execute  our 
program. 

Nonetheless,  even  if  we  accept  the  constraints  that  machines  and 
programming  languages  impose  on  us,  something  can  still  be  done  to  better 
represent  the  idea  of  Program  1 . 

One  major  cause  of  errors  in  a program  is  the  concern  for  (machine) 
efficiency.  Only  if  we  first  know  that  a program  is  correct  can  we  then 
be  concerned  about  efficiency.  If  a program  is  not  correct,  it  matters 
little  how  fast  it  runs. 

In  line  4 of  Program  1 there  is  a test  for  i f m so  that  we  can 
avoid  inverting  a singleton  cycle,  even  though  the  cycle  inversion  works 
for  a singleton  cycle.  But  the  average  number  of  singleton  cycles  in  a 
permutation  of  size  n is  just  1_  (Knuth  (23]  , p.  178).  So  this  attempt 
to  Improve  efficiency  has  actually  decreased  efficiency. 

Because  of  the  early  introduction  of  the  variable,  i,  it  is  not 
readily  apparent  that  the  test  i t m is  a test  for  a singleton  cycle.  An 
attempt  such  as  this  to  avoid  reevaluation  of  A [m]  is  usually  unnecessary 
because  most  compilers  can  recognize  a duplicate  expression  and  avoid  a 
recalculation  for  us. 

In  order  to  obtain  a better  program  we  should  return  to  the  design 
stage.  Another  outline  of  the  algorithm  is: 

For  m:  = 1 to  n do 

if  cycle  at  m has  not  been  inverted  then 

invert  and  mark  every  element  of  the  cycle 

remove  the  marker  from  this  element. 

It  is  more  natural  to  take  m from  1 to  n than  to  go  the  opposite  way. 

We  can  avoid  special  cases  by  marking  (with  a minus  sign  (-))  every 
element  of  the  cycle,  whereas  Program  1 leaves  element  m unmarked. 

The  most  difficult  part  of  Program  1 is  that  for  chasing  around  a 
cycle,  especially  the  part  for  moving  from  one  element  to  the  next.  All 
ways  of  moving  around  a cycle  are  similar  in  that  they  have  the  following 
outline: 


start  at  m 


while  not  done  do 
begin 

process  the  current  element 
move  to  the  next  element 
end . 

As  an  example,  consider  the  problem  of  finding  an  element  x which  we 
know  to  be  somewhere  in  the  cycle.  The  program  part: 

i : = m ; 

whi le  A [i  ) t x do 
i:  = A [i  ] ; 

will  return  i such  that  A[i]  = x.  Other  than  the  array.  A,  and  the  value, 
x,  this  program  part  requires  one  other  variable,  the  variable  i.  As  we 
shall  see,  the  number  of  additional  variables  indicates  the  complexity 
of  the  process. 

As  a second  example  of  loop  chasing,  assume  that  we  want  to  break 
the  cycle  into  all  singleton  cycles.  The  program  part: 

i : = m; 

j:=  A[1J; 

whi le  A [ i ] t i do 
begin 
[i  ] : = i; 
i : = j; 
j:  = A [1  ] ; 
end; 


W — 


requires  two  additonal  variables,  1 and  j.  Before  we  modify  A [ 1 ] to  be  i 
we  must  have  a way  of  getting  to  the  next  element,  hence  the  need  for  pre 
viously  saving  this  next  element  in  j. 

The  program  part  for  finding  an  element  x does  not  modify  the  cycle 
so  there  is  no  statement  comparable  to  A(i]  :*  i as  in  the  last  example; 
yet,  both  examples  have  a similar  pattern  of  movement. 

In  both  examples  the  conditions  for  loop  termination  accurately 
represent  the  functions  of  the  programs  parts.  In  the  first  program 
part  we  desire  to  have  A [ i ] = x.  When  this  happens  we  leave  the  loop. 

In  the  second  program  part  we  must  have  A [ i J = i.  When  this  happens 
we  leave  the  loop  because  this  element  (i)  was  previously  visited. 

When  inverting  a cycle  we  must  nave  knowledge  of  three  (adjacent) 
elements,  so  we  need  three  additional  variables;  the  current  element, 
the  last  element  (where  we  want  the  current  one  to  point),  and  the  next 
element  (where  the  current  one  now  points).  The  program  part: 

i :=  m; 

j :=  A [ i ] ; 

k :=  A [ j ] ; 

while  A[j]  f -i  do 

begin 

A [j  ] :=  -i; 
i :=  j; 
j :=  k; 
k :=  A[j ] 
end 

Inverts  a cycle  and  negates  each  element  to  indicate  the  visit. 

Note  that  the  terminating  condition  (A|j]  = -i)  and  the  first 
statement  of  the  loop  ( A [ j ] :=i;)  are  very  similar. 
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The  part  of  the  loop  body  for  moving  to  tne  next  element  is 
reminiscent  or  a centipede  that  is  crawling.  Note  the  effect  of 
the  loop  body  on  a segment  of  a cycle. 


1 
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A[j]  :=  -i; 

k— 

i :=  j; 

k — 

j :=  k; 

JJ-.OzL,.  ]k_.. 

k :=  A[j ] ; 


* 
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We  finally  arrive  at  the  program  shown  as  Program  2. 
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For  m:  » 1 to  n do 

begin  if  A|m|  s o then 
begin  i : ■ m; 

j:  - All]; 
k:  > A | j ] ; 
while  A|j]  f -i  do 

begin  A[j  | : » -1 ; 
i:  = j; 
j:  “ k; 
k:  - A | j 1 
end 

end; 

A [ml:  * -A lml 


end. 


Program  2:  Invert  a permutation,  A,  in  situ. 

This  program  examines  each  element  m.  If  the  element  at  m is 
positive  then  we  have  found  a loop  that  has  not  yet  been  inverted.  So 
we  invert  it.  Therefore  at  line  13,  Aim  1 will  always  be  negative. 

The  assignment  statements  at  lines  3-5  and  at  lines  8-10  point  out 
that  i,  j,  and  k are  adjacent  elements  of  the  cycle  (the  last,  current 
and  next  elements).  The  test  In  line  6 (A[j|  f -1)  points  out  that  the 
value  A|jl  is  not  what  we  wish  it  to  be.  The  situation  is  thus  remedied 
In  11 ne  7 . 

Not  only  is  Program  2 easier  to  understand  than  Program  1,  but  It. 
also  executes  faster.  The  execution  time  of  Program  2 can  be  Improved 
even  more  (by  Inverting  A|m]  outside  the  while  loop),  but  it  will  then 
be  a larger  (more  difficult  to  understand)  program. 
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4.  PRIMARY  PROGRAM  DESCRIPTIONS.  The  example  of  the  previous 
section  indicates  that  even  very  simple  programs  are  difficult  to  under- 
stand and,  consequently,  will  be  hard  to  certify  with  any  degree  of 
assurance.  Still  we  should  like  to  suggest  what  we  think  the  major 
difficulties  are  in  constructing  correct  programs. 

Undecldeabil ity  of  Equivalence 

Given  two  programs,  or  a program  and  a specification,  it  is  known  that 
there  is  no  algorithm  to  decide  whether  the  two  are  equivalent.  The 
consequence  of  this  theoretical  result  is  that  we  should  find  it  difficult  to 
show  that  a program  and  its  specification,  or  that  two  programs  independent- 
ly developed,  are  equivalent.  For  each  of  the  programs  given  in  the  preceding 
section,  a proof  that  the  program  meets  its  specification  is  indeed  somewhat 
difficult. 

The  Competence-Performance  Dilemma 


Another  reason  that  it  is  hard  to  write  correct  programs  is  that  we 
are  often  tempted  to  write  programs  that  are  about  as  complicated  as  we 
can  manage.  Precisely  because  these  programs  are  at  the  limits  of  human 
performance,  they  are  also  difficult  to  debug.  Of  course,  the  syntax  of 
programming  languages  only  deals  with  the  competence  aspect  of  language 
and  ignores  this  "human  engineering"  dimension.  Restricting  program 
text  to  a single  page,  or  eliminating  goto's  can  only  be  partial  solutions. 

What  you  see  is  not  what  you  get 

The  program  text  presented  as  tne  solution  to  a problem  does  not 
indicate  its  derivation,  or  the  underlying  principles  used  to  arrive  at 
the  program.  This  program  deep  structure  [25]  is  essential  to  an  under- 
standing of  the  program,  ancTthe  correction  of  bugs  almost  invariably  requires 
such  an  understanding.  As  noted  in  [25]  and  [26],  a program  developed 
rationally  at  the  level  of  deep  structure  and  then  optimized  via  trans- 
formations is  perfectly  acceptable,  providing  that  the  deep  structure 
and  transformations  are  part  of  the  documentation. 

We  have  commented  elsewhere  [27]  on  the  particular  difficulties 
of  developing  and  understanding  programs  in  systems  that  require  the 
programmer  to  do  optimizing  transformations.  An  example  which  is  generally 
familiar,  is  the  one-pass  vs.  two-pass  assembler.  The  two  pass  assembler 
is  much  clearer  than  the  one-pass  assembler,  since  the  one-pass  assembler 
Is  essentially  a transformed  and  optimized  version  of  the  two-pass  assembler. 

Primary  Program  Descriptions 

If  our  analysis  is  correct,  then  a proper  methodology  for  program 
correctness  must  be  based  on  the  following  principles: 


i)  There  must  be  a document  completely  describing  the  program. 


ii)  The  program  description  must  be  understood  by  its  intended 
audience  at  a level  well  within  the  limits  ot  their  performance. 

iii)  The  documentation  must  include  all  of  the  deep  structural 
information,  and  all  of  the  relevant  transformations. 

We  will  call  a description  satisfying  these  criteria  a primary 
program  description  (PPD).  No  particular  syntax  is  prescribed  for 
the  PPD  as  the  fol lowing  ^xamp I es  illustrate: 

a)  A payroll  program  may  use  an  annotated  decision  table  as 

a PPO.  , 

b)  A lexical  scanner  [28]  may  use  an  annotated  finite  state 
transducer  as  a PPD. 

c)  A depth  first  search  algorithm  | 29 ) may  use  an  annotated 
pidgin-Algol  program  as  a PPD. 

d)  Various  flowcharts  or  structured  flowcharts  with  appropriate 
commentary  can  serve  as  PPD  [30], 

The  important  point  is  that  if  different  forms  of  PPD  are  used 
with  the  same  program,  then  the  correspondence  between  the  PPD's 
must  be  established  algorithmically.  Moreover,  there  will  generally 
be  a syntax-directed  translation  between  PPD's  using  algebraic 
methods  as  described  in  [31]. 

Conclusion 


We  do  not  claim  that  what  we  have  presented  here  will  solve 
or  even  significantly  reduce  the  software  problem.  But  it  seems  to 
us  that  an  approach  to  correctness  should  include  the  concept  of 
a primary  program  description  as  a basis  for  program  management  and 
for  all  levels  of  documentation. 


33 


References 


1.  Boehm,  B.W.  Software  Engineering:  R&D  Trends  and  Defense  Needs. 

Presented  at  the  Conference  on  R&D  Problems  in  Software.  Brown 
University,  Oct.  1977. 

2.  Fisher,  D.A.  Automatic  Data  Processing  Costs  in  the  Defense  Depart- 
ment. Institute  for  Defense  Analysis,  Paper  page  P-1046.  Oct.  1974. 

3.  Brandon,  D.H.  Commercial  Software  in  Software  Portability:  An 
Advanced  Course.  Ed.  P.J.  Brown,  Cambridge  University  Press, 

1977,  pages  203-212. 

4.  Newton,  G.E.  A Partially  Annotated  Bibliography  of  Top-Down  and 
Goto-less  Programming.  Proceedings  IRIA  Colloquium  on  Proving  and 
Improving  Programs.  Rocquecourt,  France,  July  1975,  pages  435-473. 

5.  Proceedings  2nd  International  Conference  on  Software  Engineering. 

San  Francisco,  Cal.,  Oct.  1976. 

6.  Baker,  F.T.  Chief  Programmer 'Team  Management  of  Production  Programming. 
IBM  Systems  Journal .YoTY  11,  No.  1,  pages  56-73,  1972. 

7.  Weinberg,  .G-. Mr  The  Psychology  of  Computer  Progragming.  Van  Nostrand 
ReinhoVd  "Co. , 1971. 

..  -8.  Floyd,  R.W.  Assigning  Meaning  to  Programs  in  Proceedings  of  a 
Symposium  in  Applied  Mathematics.  Vol.19,  J.  Schwartz,  Ed.  AMS, 
pages  19-32,  1967. 

9.  Hoare,  C.A.R.  An  Axiomatic  Basis  for  Computer  Programming.  Communi- 
cations ACM,  pages  576-580,  Oct.  1969. 

10.  Dijkstra,  E.W.  A Discipline  of  Programming.  Prentice  Hall,  y&G. 

11.  King,  J.C.  A Program  Verifier.  Ph.D.  Thesis,  CMU,  1969. 

12.  Boyer,  R.S.  and  Moore,  J.S.  Proving  Theorems  about  LISP  Functions. 

JACM,  Vol . 22,  No.  1,  pages  129-144,  Jan.  1975. 

13.  Burstall,  R.M.  and  Darlington,  J.  A Transformation  System  for  Developing 
Recursive  Programs.  JACM,  Vol.  24,  No.  1,  pages  44-67,  Jan.  1977. 

14.  Manna,  Z.  Mathematical  Theory  of  Computation.  McGraw  Hill,  1974. 


34 


■ 


i 


15.  Manna,  Z.  and  Waldinger,  R.  The  Logic  of  Computer  Programming. 

Aug.  1977,  Stanford  Report  #STAN-CS-77-61 1 . 

16.  London,  R.L.  Program  Verfication.  In  Conference  on  K&D  Problems 
in  Software,  Brown  University,  1977. 

17.  DeMillo,  R.A.,  Lipton,  R.J.,  and  Perlis,  A.J.  Social  Processes  and 
Proofs  of  Theorems.  Proceedings  4th  Symposium  on  the  Principles  of 
Programming  Languages,  Los  Angeles,  pages  144-154,  1977. 

18.  Dijkstra,  E.W.  A Political  Pamphlet  from  the  Middle  Ages.  Unpublished 
Response  to  [16]. 

19.  leicherow,  D.  and  Hershey,  III,  E.A.  PSL/PSA,  A Computer  Aided 
Technique  for  Structured  Documentation  and  Analysis  of  Information 
Systems.  Abstract  in  Proceedings  2nd  International  Conference  on 
Software  Engineering,  San  Francisco,  Cal.,  1976,  page  2. 

20.  Teicherow,  D.  and  Hershey,  III.,  E.A.  Documentation  Methodology. 

In  Proceedings,  Conference  R&D  Problems  in  Software,  Brown  University, 
1977. 

21.  Mills,  H.U.  Syntax-Directed  Documentation  for  PL  360.  Communications 
of  the  ACM,  Vol . 13,  pages  216-222,  April  1970. 

22.  Burstall,  R.M.  Proving  Programs  as  Hand  Simulation  with  a Little 
Induction.  Proceedings,  IFIP  Congress,  Stockholm,  pages  308-312, 

1974. 

23.  Knuth,  D.E.  Fundamental  Algorithms.  Addison  Wesley,  1969. 

24.  Gries,  D.  An  Exercise  in  Proving  Parallel  Programs  Correct. 
Communications,  ACM,  Vol.  20,  No.  12,  pages  921-930. 

2b.  Levy,  L.S.  and  Melville,  R.  The  Algebraic  Anatomy  of  Programs. 

The  Computer  Journal,  Nov.  1977. 

26.  Knuth,  D.E.  Programming  with  GOTO's.  Computing  Surveys,  Dec.  1974. 

27.  Leathrum,  J.  and  Levy,  L.S.,  unpublished  memo,  July  1977. 

28.  Aho,  A.V.  and  Ullman,  J.D.  Principles  of  Compiler  Design.  Addison 
Wesley,  1977. 

29.  Aho,  A.V.,  Hopcroft,  J.E.,  and  Ullman,  J.D.  The  Design  and  Analysis 
of  Algorithms.  Addison  Wesley,  1974. 


35 


30.  Carberry,  S.,  Khalil,  H.,  Leathrum,  J.F.,  and  Levy,  L.S.  General 
Computer  science.  Charles  E.  Merrill,  1976. 

31.  Levy,  L.S.  Discrete  Structures.  Wiley,  forthcoming. 


W 


THE  BRL  BESSEL  FUNCTION  SUBROUTINE 


Kathleen  L.  Zimmerman  (Speaker) 
Alexander  S.  Elder  (Co-Author) 
Propulsion  Division 

U.S.  Army  Ballistic  Research  Laboratory 
Aberdeen  Proving  Ground,  Maryland  21005 
Autovon  283-3083 


ABSTRACT.  A subroutine  for  calculating  Bessel  functions  of  integral 
order  and  complex  argument  has  been  developed  at  the  BRL  and  has  been 
run  on  the  UNIVAC  1108,  the  BRLESC,  and  the  CDC  7600  computers.  The 
requirements  for  accuracy,  simplicity,  generality,  and  reduction  of 
round-off  error  have  been  satisfied  by  the  computational  methods  used. 
While  the  input  can  be  given  in  either  cartesian  or  polar  coordinates, 
all  of  the  calculations  are  carried  out  in  terms  of  cartesian  coordinates 
(x+iy'z,  where  z is  complex). 

Three  methods  of  calculation  were  chosen  to  compute  the  ordinary 
and  modified  Bessel  functions  of  the  first  and  second  kind: 

1.  Weber-Schlaf li  series  for  |z|  <_  2.5, 

2.  Gauss  continued  fractions  and  recurrence  formulas  for 
2.5  < |z|  £21.0,  and 

3.  Hankel  asymptotic  series  for  |z|  > 21.0. 

The  computation  of  the  Weber-Schlaf li  series  for  small  values  of  z is 
straightforward.  Terms  of  the  infinite  series  are  summed  from  the 
smallest  term  to  the  largest  term  in  order  to  avoid  cancellation  error, 
Bessel  functions  for  large  values  of  z are  calculated  using  Hankel 
asymptotic  functions  in  the  right  half-plane.  Analytic  continuation  is 
used  to  compute  corresponding  functions  in  the  left  half-plane. 

Bessel  functions  of  the  first  kind  for  moderate  values  of  z are 
calculated  by  selecting  an  order  that  is  much  larger  than  |z|  and 
applying  a downward  recurrence  relation  using  a Weber-Schlafli  series 
approximation  for  the  starting  value.  Although  this  series  can  be  used 
to  start  a similar  recurrence  for  modified  Bessel  functions  of  the 
second  kind,  functions  of  lower  order  cannot  be  calculated  accurately 
as  the  difference  of  two  nearly  like  numbers  occurs  in  the  course  of 
the  computations.  The  innovative  use  of  Gauss  continued  fractions  has 
yielded  accurate  results  for  the  computation  of  Kn(z).  The  ordinary 
Bessel  functions  of  the  second  kind  are  calculated  in  terms  of  Hankel 
functions  which  are  linear  combination  of  ordinary  Bessel  functions. 

The  results  of  the  subroutine  are  generally  good  to  at  least  13 
or  14  significant  figures  for  small  and  moderate  orders;  the  accuracy 
falls  off,  however  for  the  higher  orders.  The  accuracy  has  been  veri- 
fied for  selected  ranges  of  argument  and  order. 


1.  INTRODUCTION . The  development  of  this  subroutine  began  in  the  late 
1960's  when  Mr.  Elder,  the  principal  investigator,  realized  that  Bessel 
functions  of  complex  argument  were  needed  to  solve  a general  class  of 
problems  involving  the  Laplace  and  biharmonic  equations  in  cylindrical 
coordinates.  in  particular,  the  evaluation  of  Fourier  integrals  by  the 
theory  of  residues  required  complex  eigenvalues  to  a high  degree  of 
accuracy.  Although  tables  of  Bessel  functions  in  complex  argument  were 
available  at  that  time,  they  were  limited  in  scope  and  accuracy.  Inter- 
polation for  intermediate  values  caused  a further  loss  in  accuracy. 

I ' 

The  mathematical  analysis  was  essentially  completed  in  1968  by  Mr. 
Elder.  Mrs.  Alene  Depue  completed  the  programming  shortly  thereafter. 
Due  to  a serious  illness  which  eventually  caused  her  early  retirement, 
Mrs.  Depue  did  not  document  the  program.  As  too  often  happens,  the 
computer  code  was  used  and  the  report  ignored.  In  1976  I was  assigned 
the  task  of  writing  the  report  on  this  subroutine.  Since  there  were 
virtually  no  notes  around,  it  was  necessary  to  reconstruct  the  entire 
derivation  before  beginning  to  write  the  report.  The  text  was  completed 
last  summer,  and  the  report  is  nearly  ready  for  publication. 

2.  OBJECTIVES  AND  SCOPE.  During  the  early  seventies  more  interest  was 
shown  in  this  work.  Other  good  codes  have  been  developed,  generally  for 
real  variables,  but  it  is  felt  that  this  one  still  meets  the  require- 
ments for  accuracy,  simplicity,  generality,  and  reduction  of  round-off 
error  when  calculations  involving  complex  arguments  are  required.  The 
program  calculates  ordinary  and  modified  Bessel  functions  of  complex 
argument  and  integral  order  from  0 to  40.  Pairs  of  orders,  m and  n=m+l, 
and  the  functions  of  both  the  first  and  second  kinds  are  calculated  for 
either  the  ordinary  or  modified  Bessel  functions  in  any  given  call  to 
the  subroutine. 

3.  STRUCTURE.  There  are  four  main  sections  in  the  code: 

a.  Preliminary  calculations 

b.  Evaluation  for  small  z:  J z | <2.5 

c.  Evaluation  for  moderate  z:  2. 5<  | z |<21.0 

d.  Evaluation  for  large  z:  | z | > 21 . 0 

4.  COMPUTATIONAL  METHODS.  The  first  section  converts  the  input,  if  it 
was  given  in  polar  coordinates,  to  cartesian  coordinates.  If  cartesian 
coordinates  are  entered,  the  calculation  of  the  polar  coordinate  0 is 
computed  using  the  half-angle  formula  for  ARCTAN  since  its  principal 
values  lie  between  n/2  and  -ir/2.  This  should  make  the  calculations 
independent  of  any  predefined  ARCTAN  subroutine.  A few  quantities  which 
appear  regularly  throughout  the  program  are  defined  here.  Tests  are 
performed  to  determine  which  method  of  calculation  will  be  used  ac- 
cording to  the  value  of  rho. 
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Calculation  of  ordinary  and  modified  Bessel  functions  of  the  first 
kind  for  small  values  of  the  argument  involve  the  evaluation  of  an 
infinite  series  and  are  accurate  when  |z|  is  small  or  the  other  is  very 
large  compared  to  | z | . 
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Functions  of  the  second  kind  are  calculated  by  the  Weber-Schlaf li 
formulas  which  contain  a logarithmic  term,  an  infinite  series  term,  and 
a finite  series  term.  In  both  eases,  the  terms  of  the  series  are  summed 
from  the  smallest  to  the  largest  to  avoid  round-off  error.  The  procedure 
used  is  standard. 
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Ordinary  and  modified  Bessel  functions  of  the  first  kind  for  moder- 
ate values  of  the  argument  are  calculated  using  downward  recurrence 
formulas: 
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To  start  the  recurrence,  a sufficiently  large  order  is  chosen  so 
that  the  series  computation  for  the  initial  high  order  J or  I function 
will  he  accurate.  Recall  that  the  infinite  series  is  accurate  if  the 
order'  is  much  larger  than  the  argument.  Miller  has  shown  that  cancel- 
lation does  not  occur  when  stable  recurrence  relations  are  used. 

An  analagous . procedure  cannot  be  used  for  functions  of  the  second 
kind  because  Miller  has  shown  the  recurrence  relations  are  not  stable: 
the  difference  of  two  nearly  like  numbers  occurs  in  the  calculations  of 
functions  of  lower  order. 

The  use  of  the  Gauss  continued  fraction  to  calculate  modified  Bessel 
functions  of  the  second  kind  was  Mr.  Elder's  contribution  to  this  work. 
The  equation 
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is  related  to  the  hypergeometric  equation 
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discussed  by  Wall  and  others.  It  can  be  seen  immediately  that  the 
modified  K function  can  be  expressed  in  terms  of  the  equation  given  by 
Wall. 
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Now  if  a quotient  function  Q is  defined  in  terms  of  K and  K , , then 
^ n n n-1 

the  exponential  and  constant  factors  will  be  eliminated,  yielding  a ratio 

of  two  hypergeometric  functions: 
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The  functions  and  are  forms  of  the  Gauss  continued  fraction 
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Since  it  is  of  the  form  1 + , we  need  not  worry  about  cancellation. 

The  computation  of  these  fractions  is  accomplished  by  an  iterative 
procedure. 

Calculations  of  the  modified  function  of  the  second  kind  are  made 
by  substituting  the  quotient  function  in  the  Wronskian 
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Analytic  continuation  must  be  used  to  calculate  the  correct 
functional  value  for  values  of  z lying  in  the  left  half-plane. 

Ordinary  Bessel  functions  of  the  second  kind  are  calculated  in 
terms  of  Hankel  functions  which  are  linear  combinations  of  ordinary 
Bessel  functions: 
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The  J's  have  already  been  calculated  by  recurrence;  a method  for  evaluat- 
ing the  H functions  must  be  developed.  Looking  at  the  equations  for  the 
Hankel  functions. 
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it  can  be  seen  that  they  can  be  rewritten  in  terms  of  the  modified 
functions  just  calculated.  Substituting  similar  quotient  functions  into 
Wronskians  gives  the  Hankel  functions  in  terms  of  the  ordinary  functions 
of  the  first  kind  and  the  quotient  function: 
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Using  the  values  now  available  for  the  Hankel  functions  makes  the  compu- 
tation of  ordinary  Bessel  functions  of  the  second  kind  a simple  matter. 
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Bessel  functions  for  large  values  of  z are  evaluated  using  Hankel 
asymptotic  formulas.  If  the  given  argument  does  not  lie  in  the  right 
half-plane,  it  is  rotated  ± 180  degrees  to  either  quadrant  I or  IV  so 
that  it  lies  within  the  range  of  all  the  formulas  used. 
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The  evaluation  of  the  series  expansions  P and  Q in  the  asymptotic 
formulas  is  straightforward.  Direct  substitution  of  the  Hankel  values 
yields  the  desired  result  for  the  ordinary  Bessel  functions. 

The  modified  Bessel  functions  are  calculated  from  equations 
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involving  Che  same  series  expansions  used  in  the  ordinary  function 
calculations.  The  methods  are  standard  and  certainly  do  not  need 
detailed  explanation. 

Analytic  continuation  formulas  are  used  to  obtain  correct  functional 
values  for  arguments  that  were  rotated  to  the  right  half-plane  in  order 
to  use  these  formulas. 

5.  FUTURE  WORK  AND  CHECKING  METHODS . There  are  regions  of  overlap 
between  the  methods  of  calculation.  Taking  advantage  of  this  allowed 
some  checking  of  results  between  the  methods.  Double  precision  runs  on 
the  UNIVAC  were  used  to  check  for  round-off  error  on  BRLESC.  Results 
agree  to  at  least  13  significant  figures  on  the  CDC.  UNIVAC,  and  BRLESC. 
We  are  proceeding  on  a time  available  basis  to  check  using  other  methods 
of  calculation. 

We  are  preparing  a CDC  double  precision  version  of  the  subroutine. 
The  region  in  which  the  Gauss  continued  fraction  is  used  will  be  extended 
past  21.0  since  the  Hankel  asymptotic  formulas  will  not  give  20  signif- 
icant figures  at  21.0.  It  is  expected  that  similar  adjustments  will 
have  to  be  made  in  other  constants. 

We  would  like  to  extend  the  program  to  include  real  orders;  this 
will  be  done  if  any  interest  is  expressed.  Eventually  the  work  will 
include  Rummer  and  Whittaker  functions  of  complex  argument. 

A more  detailed  description  of  the  coding  and  formulas  used  is 
contained  in  the  ARRADCOM  Special  Publication  "User's  Manual  for  the 
BRL  Subroutine  to  Calculate  Bessel  Functions  of  Integral  Order  and 
Complex  Argument."  The  report  is  expected  to  be  published  this  Spring 
and  will  be  available  through  the  Defense  Documentation  Center. 
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ABSTRACT 


Recent  estimates  show  that  50 X or  more  of  all  software  costs  accrue 
after  the  software  has  been  written.  It  is,  therefore,  important  to  be 
able  to  repair  software  easily  and  efficiently. 

This  paper  details  the  interior  command  structure  to  perform  a 
semantic  update  for  FORTRAN  software.  Semantic  updating  is  oriented 
towards  systems  of  programs  which  are  interrelated.  The  effects  of  a 
semantic  update  are  felt  through  an  entire  module  (sub-program)  and  often 
throughout  an  entire  system  (program).  Semantic  updating  operates  on  an 
entire  statement  (command)  rather  than  a line  (card  image)  treated  by 
classical  (existing,  traditional)  updating  systems.  A semantic  updating 
system  tends  towards  a program  understandi ng  system  (although  it  is  not 
in  the  class  of  a full-blown  artificial  intelligence  program),  as  opposed 
to  text  editors  which  tend  toward  generalized  word  processors  which  are 
applicable  to  any  file.  The  current  design  covers  seventy-two  commands. 
It  can  deal  with  side  effects.  No  exterior  command  system  has  been 
designed;  however,  an  implementation  is  being  considered  for  a Control 
Data  Corporation  Cyber  System.  It  is  estimated  that  once  implemented,  a 
semantic  updating  system  would  realize  an  order  of  magnitude  or  greater 
savings  over  update  systems  currently  in  use. 
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A SEMANTIC  UPDATING  SYSTEM 


FOR  REPAIRING  SOFTWARE 


1 INTRODUCTION 

In  a 1973  article1  Boehm  showed  that  roughly  70  percent  of  all 
computer  costs  were  software  related  and  predicted  software  costs  would 
continue  to  rise  to  roughly  90  percent  of  all  computer  costs  by  1985 
(Figure  1).  To  date  there  has  been  no  evidence  presented  to  contradict 
Boehm's  estimates.  Moreover,  recent  articles  have  estimated  software 
maintenance  costs  to  be  between  40  and  60  percent  of  all  software  costs. ^>3,4 
Mills  predicted  that  "unless  radical  new  methods  are  found,  maintenance 
will  go  even  higher  in  its  demands  and  will  very  nearly  stifle  further 
development"  (Ref.  3,  page  80). 

Structured  programming,^  now  in  its  second  decade,  has  not  been  the 
panacea  needed  to  signi f icantly  reduce  maintenance  costs.  In  fact,  Kraft 
claims  structuring  techniques  have  so  routinized  programming  that  we  have 
developed  a work  force  of  white-collar  clerks  who  in  general  are  not  as 
skilled  as  programmers  were  before  structuring.  Similarly,  top-down 
programming  may  be  very  good  for  design  but  in  practice  is  not  used  in 
coding  by  sophisticated  programmers . ' 


Boehm,  B.W.,  "Software  and  Its  Impact:  A Quantitative  Assessment," 
Datamation,  May  1973,  pp.  48-59. 

2 

Boehm,  B.W\ , "The  High  Cost  of  Software,"  In. . .Practical  Strategies  for 
Developing  Large  Software,  Addison  Wesley,  1975  (12  pagesT- 

3 

Mills,  H.D.,  "Software  Development,"  Procedings  2nd  International 
Conference  on  Software  Engineering  Supplement,  October  1976,  pp.  79-86. 

^ Putnam,  L.H.  and  Wolverton,  R.W. ."Quantitative  Management:  Software  Cost 
Estimating,"  IEEE  Computer  Society,  Publication  EH0-129-7,  November  1977. 
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There  are,  however,  new  ideas  now  emerging  in  Artificial  Intelligence 
(AI)  which  can  significantly  reduce  software  costs  and  software  maintenance 
(Section  3).  It  is  perhaps  paradoxical  that  Artificial  Intelligence, 
which  fosters  so  much  work  in  the  area  of  natural  languages,  has  come  full 
circle  and  is  now  generating  concepts  for  dealing  with  artificial  languages, 
namely,  programming  languages.  The  new  concepts  in  Artificial  Intelligence, 
coupled  with  classical  card/file  update  (Section  2).  has  led  to  the  concept 
of  a semantic  updating  system  for  repairing  software  (Section  4). 

A semantic  update  system  differs  from  the  classical  update  system  by 
automatical ly  searching  for,  and  in  most  cases  performing,  side-effect 
analyses  that  results  from  programmer-speci fied  changes.  Thus  a programmer 
using  a semantic  update  system  can  express  himself  in  terms  that  relate  to 
the  modification  of  a program,  rather  than  the  alteration  of  lines  or 
card  images. 

The  basic  notion  of  a semantic  update  system  can  be  understood 
by  accepting  a change  in  one's  point  of  view.  While  a classic  update 
system  deals  with  "decks  of  cards",  a semantic  update  system  recognizes 
that  changes  are  made  to  systems  that  are  composed  of  programs  that  are 
related  to  each  other.  As  a consequence,  program  modifications  that  have 
appreciable  side  effects  are  now  a possibility  rather  than  a situation  to 
be  avoided  because  of  the  extent  and  cost  of  all  of  the  changes  that  have 
to  be  made. 

Thus,  the  object  of  a semantic  update  system  is  to  remove  some  of  the 
basic  barriers  that  exist  between  programmers'  intuitions  about  software 
systems  and  thinking  that  is  based  solely  on  the  physical  form  they  take 
(i.e.,  cards,  continuation  rules,  positional  information,  etc.).  When  such 
barriers  are  removed  it  can  be  expected  that  programmers  would  be  more  able 
to  concentrate  on  significant  problems  of  text  modification  at  a semantic 
rather  than  at  a mechanical  level.  Some  of  the  FORTRAN-based  salient  factors 
of  semantic  updates  are: 

• It  allows  the  user  to  think  in  terms  of  modifications  to 
programs,  rather  than  in  terms  of  changes  to  "lines"  or  card- 
images. 

• It  is  oriented  towards  large  software  systems  with  many 
interrelated  modules. 

• Its  fundamental  units  are  FORTRAN  statements,  not  card-images. 

• It  applies  to  programs  written  in  ANSI  FORTRAN  although 
later  versions  may  extend  to  other  languages. 
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This  paper  describes  the  characterization  of  the  semantic  update  and 
its  expected  gain  as  a maintenance  tool.  More  than  that,  a semantic 
updating  system  is  a realizable  effort  which  can  be  completed  in  the  near 
term,  providing  insights  for  several  of  the  Artificial  Intelligence  proposals 
which  are  now  paper  studies  and  likely  to  be  a decade  away. 

2 CLASSICAL  UPDATING  SYSTEMS 

There  are  two  large  classes  of  maintenance  problems  which  have  been 
handled  with  classical  card/file  updating  systems: 

a.  Improved  designs 

b.  Error  correction 

In  the  case  of  improved  design,  an  operational  program  is  modified  to 
account  for  new  algorithms,  different  parameteri zations , or  implementation 
on  a new  machine.  On  the  other  hand,  errors  may  have  been  produced 
because  of  design  (logic  and  algorithm  errors),  or  coding  (syntax  and 
semantic  errors). 

The  tool  which  has  been  used  for  both  improved  design  and  error 
correction  for  nearly  twenty  years  is  the  classical  card/file 
system  (Figure  2). 

The  history  of  update  systems  (here  meant  in  a general  way  to  include 
any  system  used  to  assist  in  management  of  a large  source  program)  is 
rooted  in  the  very  earliest  generalized  batch  processing  operating  systems. 

In  that  enviornment  it  was  first  possible  to  maintain  large  software 
systems  on  magnetic  tape  as  card  images  rather  than  explicitly  as  physical 
card  records.  In  many  cases  a "hard  copy"  of  the  tape  was  kept  as  back-up. 

Vestiges  of  the  earliest  use  of  update  systems  could  still  be  seen  in 
the  late  1970's  when  interest  in  systematizing  source  code  control  processes 
with  configuration  control  procedures  first  became  of  wide  interest.  Needed 
configuration  control  usually  was  achieved  by  mixing  manual  record  keeping 
procedures  with  controlled  use  of  a basic  update  system. 
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It  employs  a static  analysis  of  the  program;  i.e.,  it  avoids 
executing  the  program. 

It  applies  to  an  entire  program  or  to  selected  parts  of  a 
program. 

Trial  updates  can  be  performed  with  side-effects  reported 
to  the  user. 

Programmer  knowledge  about  the  software  is  supported  by  a 
series  of  reports  thus  providing  automated  documentation. 


Figure  2:  Basic  Structure  of  a Classical  Card/File  Update  System 


The  classic  update  system  characterized  by  a number  of  relatively 
standard  features: 


1.  Systems  were  designed  to  operate  in  a batch  environment 
and  were  specifically  intended  to  process  card  images 
(i.e.,  either  source  programs  or  source  data  card  images). 

2.  Systems  processed  only  an  elementary  "include  facility" 

(e.g.,  the  *COMPKG/*CALL  facility  of  CDC  Update®)  to  be 
used  to  incorporate  defined  text. 

In  the  earliest  operating  environments  object  code  was  produced 
fresh  each  time  the  source  program  was  compiled.  Later,  tools  specifically 
designed  to  minimize  the  overhead  of  recompilation  were  developed  so  that 
it  was  no  longer  necessary  to  provide  a complete  copy  of  the  updated 
software  system  source  text  each  time  a change  was  made. 

Standard  operating  modes  for  a classical  batch-oriented  update  system 
are  described  first  in  general  terms  and  then  in  terms  of  generally 
accepted  ground  rules  of  operation.  The  ground  rules  take  the  form  of  a 
series  of  assumptions  about  the  way  the  update  system's  capabilities  are 
employed  in  managing  the  source  form  of  the  program  set  (or  software 
system),  and  also  relating  to  the  normal  way  in  which  object-code  versions 
of  the  program  text  is  handled. 

2 . 1 Classical  Update  Facility 

Figure  2 illustrates  a conventional  batch  oriented  update  system. 
There  is  an  OLD  MASTER,  which  stores  the  previous  version  of  the  software 
system.  In  operation,  a user  provides  update  commands  intermixed  with  new 
source  text  statements  to  produce:  (1)  a COMPILE  file  that  is  used  in 
the  current  run,  and  (2)  an  optional  NEW  MASTER  that  incorporates  all  of 
the  changes  made  to  the  system  in  a way  that  would  ultimately  permit 
backing  up  if  that  were  found  necessary. 

The  update  system  consumes  some  online  working  storage  as  it 
processes  the  set  of  user  commands  and  associated  program  text  against  the 
OLD  MASTER  file.  In  addition  to  compilable  code  there  is  update  system 
output  that  reports  on  the  actions  taken  and  identifies  any  exceptions 
(warnings  or  errors)  that  were  found  during  the  update  process. 

Periodically  the  OLD  MASTER  is  replaced  by  the  NEW  MASTER;  this  action 
constitutes  one  update  cycle.  A software  system  may  proceed  through  many 
update  cycles  between  the  start  of  the  development  activity  and  release  of 
the  final  version  of  the  program.  Normally  the  most  recent  OLD  MASTER 

8 

Control  Data  Corporation,  Update  Manual,  Form  84000016,  March  1976. 
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contains  the  backup  master  copy  of  the  software  system. 

In  certain  situations  when  it  is  necessary  to  create  an  OLD  MASTER 
from  a source  program  in  some  other  format  a special  conversion  program 
is  used  that  converts  a program  from  the  COMPILE  file  format  to  whatever 
is  needed  for  the  update  system  to  operate. 

2 . 2 Operating  Ground  Rules 

It  is  important  to  characterize  the  typical  operating  environment 
that  surrounds  an  update  system.  What  follows  is  a set  of  statements 
that  represent  good  procedures  and  good  judgment  in  use  of  a classical 
update  system. 

(1)  It  is  usually  assumed  that  the  NEW  MASTER  replaces  the  OLD 
MASTER  only  infrequently,  namely  whenever  the  set  of  changes 
made  to  the  source  program  on  the  OLD  MASTER  becomes  large 
enough  to  justify  the  cycle. 

(2)  There  must  be  no  capacity  limits  on  the  OLD  MASTER/NEW  MASTER 
file  formats  so  that  the  update  system  can  be  used  for  large 
appl ications. 

(3)  Usually,  the  number  of  changes  is  a small  percentage  (typically 
around  5%)  of  the  total  content  of  the  OLD  MASTER. 

(4)  There  must  be  an  option  in  the  update  system  that  generates 
a full  set  of  newly  updated  programs  for  the  COMPILE  file. 

In  normal  operation,  however,  only  the  modified  programs 
are  issued  to  the  COMPILE  file  for  compilation  and  selective 
replacement  of  previously  compiled  object  programs. 

(5)  A change  to  a program  that  does  not  pass  muster  when  compiled 
(from  the  COMPILE  file)  ordinarily  does  not  cause  loss  of  the 
run  since  the  compiler  typically  issues  no  object  code  and  the 
load  process  continues  forward  without  the  missing  module 
(using,  typically,  the  previous  version).  Ultimately,  of 
course,  all  changes  specified  by  the  user  must  make  sense  to 
the  compiler. 

(6)  The  update  system  must  not  fail  to  produce  something  regardless 
of  the  sense  of  the  user  input  (or  even  the  lack  of  an 
appropriate  OLD  MASTER).  Otherwise,  there  is  no  place  to  start. 

(7)  Changes  to  any  module  maintained  in  the  OLD  MASTER  are  either: 

(1)  small  modifications  to  individual  statement/lines,  or 

(2)  a complete  replacement/deletion  of  a module  or  modules. 
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(8)  Most  update  systems  have  incompatibility  problems  that  would 
be  solved  if  the  basic  format  for  updating  could  be  generated 
automatically  from  a full  COMPILE  file  version  of  the  program 
(the  conversion  program  box).  Most  classical  update  systems 
are  deficient  in  this  regard. 

2.3  Object  Library 

In  addition  to  maintaining  an  accurate  copy  of  the  source  text  of 
a software  system  in  many  situations  it  is  quite  desirable  to  keep  a 
related  copy  of  the  relocatable  object  text  --  one  that  corresponds  to  tie  source. 
This  need  is  usually  based  on  the  realization  that  it  may  be  too  expensive 
to  regenerate  al 1 of  the  object  code  for  a software  syster  ^~ch  time  a 
change  is  made.  Instead,  only  those  elements  of  the  soft*  system  that 
are  changed  are  actually  recompiled. 

Figure  3 shows  the  way  this  is  accomplished,  using  a library  edit 
program.  The  object  texts  of  each  recompiled  module  are  merged  with  the 
OLD  LIBRARY  to  produce  a NEW  LIBRARY.  The  merge  takes  place  with  the  rule 
that  an  object  module  on  the  OBJECT  FILE  is  always  added  to  the  NEW 
LIBRARY;  if  there  is  a previous  copy  on  the  OLD  LIBRARY  that  copy  is 
ignored.  The  result  is  a NEW  LIBRARY  that  contains  all  of  the  most-recently 
changed  object  code. 

In  normal  operation  the  combination  of  the  contents  of  the  OLD  MASTER 
and  the  OLD  LIBRARY  represent  the  best  source  of  an  executable  software 
system.  The  Update  system  and  the  Library  Edit  system  are  used  in  concert 
to  make  modifications  as  necessary.  In  some  environments  it  is  possible  to 
save  the  actual  bound  object  text  (called  the  absolute  binary  file,  or 
the  executable  object  file)  in  addition;  this  would  be  done  for  a 
frequently  used  system  to  avoid  the  expense  of  repeated  link/edit/load 
activities. 

2 . 4 Archiving  Procedures 

A final  topic  necessary  to  understanding  the  operating  environment 
for  update  systems  and  their  allied  tools  centers  on  the  procedures  used  to 
assure  a secure  backup  in  case  any  of  the  previously  described  procedures 
fail.  For  example,  coherent  knowledge  about  a software  system  would  be 
irretrievably  lost  if  the  OLD  MASTER  and  NEW  MASTER  were  assigned  as  the 
same  file  (this  would  require  write-access  to  the  file  since  the  NEW 
MASTER  is  generated  by  the  update  system)  and  if  a fatal  error  occurred 
during  the  update  activity. 

The  situation  is  similar,  but  not  nearly  so  dangerous,  during  the 
library  update  step.  At  least  in  this  case  if  there  are  errors  the  OLD 
MASTER  will  contain  a fairly  recent  program  text  that  can  be  used  to 
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Figure  3:  Maintenance  of  an  Object  Library  File 


generate  a complete  set  of  object  modules;  in  turn,  they  can  be  used  to 
regenerate  the  OLD  LIBRARY. 

Typical  operating  environments  minimize  any  impact  of  a failure  in 
one  or  more  of  the  tools  by  (a)  assuring  there  is  a backup  copy  of  the 
OLD  MASTER  and/or  the  OLD  LIBRARY  stored  in  a safe  place,  and  (b)  keeping 
the  complete  sequency  of  update  operations  on  file.  This  typically 
requires  keeping  a copy  of  each  of  the  OLD  MASTER  files  in  the  archive; 
normally  only  the  most  recently  generated  OLD  LIBRARY  is  archived.  This 
assures  that  a catastrophic  failure  will  "cost"  only  whatever  effort  is 
involved  in  moving  from  the  current  OLD  MASTER  to  the  current  NFW  MASTER. 

3 ARTIFICIAL  INTELLIGENCE  AND  PROGRAM  MAINTENANCE 

Among  the  many  activities  in  Artificial  Intelligence  (AI)  research 
has  been  the  design  of  software  which  is  error-resistant  and  can  repair 
software®  and  the  development  of  systems  which  exhibit  understanding 
about  computer  programs. 

In  a program  understanding  system,  deduction  is  performed  by  the 
system  in  order  to  provide  a knowledge  base  from  which  questions  asked 
by  the  user  can  be  answered.  This  situation  is  analogous  to  a theorem- 
proving system  in  the  following  way:  in  a theorem-proving  system  the 
axioms  represent  the  knowledge  required  for  theorem-proving. 

The  work  of  Waters^  is  the  development  of  an  automatic  program 
understanding  system  oriented  to  the  processing  of  FORTRAN  programs. 
Waters  investigates  the  structure  and  function  of  a system  that  "under- 
stands" individual  programs  by  operating  within  a plan  for  them.  The 
plans  are  developed  by  analysis  of  the  source  code  of  the  program,  plus 
assertions  about  the  program. 

The  proposed  system  (which  at  this  writing  is  not  known  to  have  been 
implemented)  is  intended  to  answer  users'  questions  about  andidate 
programs;  some  of  the  questions  that  could  be  answered  take  the  following 
form: 


• "What"  questions,  which  relate  to  the  problem-independent 
role  individual  statements,  segments  and/or  modules  have 
in  a large  software  system. 

5 Yau,  S.S.,  Cheung,  R.C.,  and  Cochrane,  D.C.,  "An  Approach  to  Error- 
Resistant  Software  Design,"  Procedings  2nd  International  Conference  on 
Software  Engineering,  IEEE  Catalog  No.  76CH1125-4C,  pp.  429-436. 

10  Waters,  R.C.,  "A  System  for  Understanding  Mathematical  FORTRAN  Programs. 
MIT,  Artificial  Intelligence  Laboratory,  AIM-368,  August  1976. 
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• "How"  questions  that  deal  with  explanations  of  the  internal 
workings  of  individual  pieces  of  a program. 

• "Why"  questions  which  require  answers  stating  the  reasons 
program  particles  were  included  in  the  program  text. 

Waters'  arguments  for  this  kind  of  question-answering  system  are  lengthy 
and  persuasive.  The  early  investigations  of  this  kind  of  system  are 
important  because  they  will  potentially  lead  in  the  direction  of 
identifying  useful  program  processing  functions  that  can  be  implemented  in 
practical  systems,  as  well  as  examining  in  detail  the  underlying  notions 
of  generalized  program  analyzers. 

A more  general  thrust  of  AI  research  is  exenpl if ied  by  the  woik  of 
Bobrow  and  Winograd^.  They  examine  the  current  state  of  program  under- 
standing systems  (such  systems  are  not  necessarily  limited  to  treating 
computer  programs).  Ultimately,  such  systems  will  provide  for  understanding 
of  natural  language;  they  are  consequently  critically  important  as  the 
prototype  for  man/machine  interface  systems. 

A typical  knowledge  representation  language  treats  understanding  in 
terms  of  sub-domains  such  as: 

Task  domains  that  describe  the  nature  of  activities  a system 
being  described  is  supposed  to  perform. 

Linguistic  domains,  i.e.,  syntactic  and  semantic  analyses  of 
descriptive  statements. 

Common  sense  domains,  which  include  primitive  statements  (like 
axioms)  that  would  be  universal  relative  to  some  systems. 

Strategy  domains,  which  include  descriptions  of  primitive  operating 
procedures  for  intermingling  of  task/linguistic/common-sense 
domain  constructs. 

Although  the  Al-related  projects  just  described  are  quite  abstract, 
they  provide  encouragement  that  in  the  long  term  there  will  be  fairly 
sophisticated  methods  for  handling  knowledge  about  programs  in  a useful 
way.  Second,  they  suggest  the  possibility  that  limited-capability  systems 
which  can  be  brought  into  reality  now  would  have  a very  significant 
effect  on  the  ease  of  managing  complex  software  systems. 


Bobrow,  D.G.,  and  Winograd,  T.,  "An  Overview  of  KRL,  A Knowledge 
Representation  Language,"  Stanford  University,  Artificial  Intelligence 
Laboratory,  AIM-293,  November  1976. 
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The  key  to  making  this  work,  of  course,  is  adopting  the  appropriate 
context  of  knowledge.  In  the  remainder  of  this  report,  the  knowledge 
base  will  generally  be  taken  as  meaning  the  body  of  information  about  a 
FORTRAN  software  system  that  is  large  and  complex.  This  knowledge  base 
is  thus  founded  in  these  two  facets  of  a FORTRAN  software  system: 

(1)  The  semantic  content  of  actual  FORTRAN  programs  arranged 
into  a software  system. 

(2)  The  auxiliary  information  about  the  software  system  design 
that  is  probably  not  included  in  the  actual  semantics  (but 
which  may  be  included  in  the  documentation). 

As  subsequent  sections  will  demonstrate,  the  relationship  between 
fairly  naive  (but  certainly  non-trivial)  forms  of  semantic  update 
capability  and  more-general  knowledge-based  processing  of  source  programs 
is  a very  close  one. 

Figure  4 illustrates  a final  point  about  the  role  of  a semantic 
update  system.  The  figure  shows  two  major  routes  of  a growth  for  update 
systems:  in  the  direction  of  sophisticated  editors,  and  in  the  direction 
of  program  understanding  systems.  The  ultimate  evolution  for  editor- 
based  systems  is  a fully  generalized  word  processor. 

4 THE  SEMANTIC  UPDATING  SYSTEM 

Figure  5 characterizes  the  primary  differences  between  a classical 
update  system,  a semantic  update  system,  and  a general  program  understanding 
system.  The  salient  features  will  be  described  below. 

The  atomic  element  for  a semantic  update  system  is  a whole  statement, 
or  a command  to  the  system  (expressed  as  a whole  statement  also).  By 
comparison,  the  input  to  the  classical  update  system  is  a single  card- 
image,  or  line. 

A second  factor  concerns  the  extent  to  which  it  is  possible  in  a 
semantic  update  system  for  individual  commands  to  affect  other  parts  of 
the  text  being  processed.  For  example,  in  a classical  update  system  a 
command  can  refer  only  to  an  individual  line,  but  in  a semantic  update 
system  a command  could  affect  the  entire  software  system. 

A third  area  involves  the  facilities  provided  by  the  semantic  update 
system  to  controlling  the  structure  of  a large  software  system.  Classical 
update  systems  provide  almost  no  support  (the  exception  is  the  define/ 
include  facility  for  blocks  of  card  images).  A semantic  update  system 
would  make  it  possible  to  define  major  elements  of  a software  system  and 
base  subsequent  commands  on  such  definitions. 
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4. 1 Information  Flow 

It  is  now  possible  to  discuss  the  general  structure  of  a semantic 
update  system,  in  terms  of  the  information  flow  its  use  would  represent  for 
a user.  Figure  6 shows  the  basic  information  flow  relationship  for  the 
conceptual  system;  a user  operates  in  a context  defined  by  the  following 
forms  of  information: 

• System  Status  Reports,  which  tell  the  user  about  the  current 
state  of  the  software  system.  These  reports  provide  fairly 
general  and  statistical  data  about  the  system  such  as  the  number 
of  modules,  the  total  number  of  statements,  the  number  of 
defined  enti ties  . 

• System  Structure  Reports,  which  provide  information  about  the 
imposed  and  natural  structures  within  the  software  system. 

• Definitions  reports,  which  provide  current  copies  of  all 
defined  entities. 

• Annotated  listings  of  the  programs  which  are  currently  being 
modified,  or  are  otherwise  of  special  interest.  This  set  of 
programs  could  include  all  of  the  modules  known  to  the  update 
system . 


In  addition  to  the  reports  provided  by  the  semantic  update  system,  the 
user  also  has  access  to  the  normal  compiler  output.  (It  may  not  be 
necessary  to  use  all  of  the  compiler  output  when  using  semantic  updating.) 


Thus,  a user's  role  is  to  examine  this  information  and,  combined  with 
his  knowledge  of  the  intended  form  of  the  software  system,  construct  (or 
modify)  commands  that  bring  the  current  system  "closer"  to  that  desired. 

It  should  be  clear  that  the  success  of  a semantic  update  system  is  based 
wholly  on  its  ability  to  fully  satisfy  the  information  needs  of  the  user. 

4 . 2 Ground  Rules 

Clearly  the  notion  of  semantic  update  is  machine  and  programming 
language  dependent;  it  is  possible  that  semantic  update  systems  could  be 
built  that  are  application  dependent  also.  It  is  important  at  the  outset 
to  Identify  some  ground  rules  that  will  keep  the  thinking  specific  enough 
to  permit  considering  real  systems.  Some  of  these  are: 

• We  will  assume  a FORTRAN  environment  throughout.  This  means 
that  the  "semantics"  of  the  underlying  programming  language 
will  be  those  of  the  FORTRAN  programming  language.  It  is  to 
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be  expected  that  at  least  some  of  the  conceptualizations 
will  reflect  the  flavor  of  FORTRAN  directly. 

• The  context  of  application  are  large  codes,  coded  entirely  in 
FORTRAN  and  intended  for  use  on  a CDC  6000  series  computer. 

• The  baseline  classical  update  system  will  be  considered  the.. 
equivalent  of  both  CDC's  UPDATE  system  and  IBM's  equivalent  . 

4 . 3 General  Structure 

Figure  7 provides  a detailed  view  of  the  major  components  of  the 
semantic  update  system.  Each  component  will  be  discussed  in  detail  below. 

The  file  control  serves  as  the  basic  interface  between  the  semantic 
update  system  and  the  files  it  must  process.  These  files  include  the 
OLD  MASTER,  the  NEW  MASTER  (which  may  not  be  used  in  every  case),  and  the 
COMPILE  file.  The  role  of  the  file  control  element  of  the  system  is  to 
provide  all  essential  interpretation  services  needed  to  expand  and/or 
contract  information  stored  on  such  files  (primarily,  here,  the  OLD 
MASTER).  During  operation  it  will  be  convenient  to  have  less-compressed 
forms  of  data  than  would  ordinarily  be  used  in  permanent  storage  format  on, 
say.  the  OLD  MASTER  file. 

The  update  process  may  require  the  analysis  of  individual 
statements  and  reassembly  with  modified  tokens,  expressions,  strings,  and 
so  forth.  The  local  work  area  manager  provides  a resource  that  makes 
manipulation  of  that  form  simple  and  straightforward. 

The  function  of  the  program  COMMON  manager  is  to  keep  track  of  the 
set  of  definitions  that  pertain  to  a software  system  stored  on  the  OLD 
MASTER  and  provide  copies  of  the  appropriate  statements  whenever  a module 
requests  them.  To  do  this  requires  that  the  source  text  of  individual 
program  COMMON  statements  be  stored  on  a run-time  database  (constructed  for 
each  update  run  from  information  stored  on  the  OLD  MASTER).  Whenever  a 
particular  COMMON  is  required  the  text  Is  copied  into  the  appropriate  locations. 

The  COMMON  manager  also  analyzes  new  COMMON  statements  for  consistency 
with  existing  definitions.  This  is  done  in  concert  with  the  FORTRAN 
statement  recognizer  (see  below). 

As  previously  mentioned,  the  semantic  update  system  has  a conditional 
assembly  feature  that  permits  a user  to  parameterize  the  inclusion  (or 
exclusion)  of  individual  statements,  blocks  of  statements,  or  even  whole 


12  IBM  Corporation,  IBM  OS/VS  Utilities,  IEBUPDTE  User's  Manual, 
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subsystems  during  generation  of  the  COMPILE  file.  The  parameter  manager 
keeps  track  of  the  currently  defined  update  parameters,  whether  or  not  they 
have  values,  and  provides  value  information  whenever  requested  by  the  main 
control  program. 

In  operation  the  parameter  manager  must  monitor  the  control  stream 
in  detail,  but  need  respond  only  when  parameter  values  are  encountered. 

(It  is  an  implementation  detail  to  determine  how  the  parameters  would  be 
distinguished.)  Whenever  an  expression  involving  parameters  can  be 
evaluated,  that  is  when  all  of  the  parameters  currently  have  a value, 
then  such  an  expression  is  evaluated.  Based  on  its  value  the  controlled 
block  of  statements  is  either  included  or  excluded  from  the  COMPILE  file. 
The  principle  of  delay  of  parameter  evaluation  is  assumed;  this  requires 
that  such  evaluations  be  virtually  the  last  activity  before  the  COMPILE 
file  images  are  generated. 

The  user  has  the  option  to  define  source-text  macro  capabilities 
within  the  FORTRAN  software  system  being  developed.  Such  a macro  might 
specify  the  form  of  a series  of  statements  that  is  encountered  often 
enough  to  justify  the  definition,  or  may  include  several  macros  included 
within  one  another.  The  macro  processor  (a)  maintains  all  currently 
defined  macro  definitions,  and  (b)  supplies  them  to  the  main  control 
program  when  necessary.  As  is  the  case  for  the  COMMON  processor,  the 
macro  manager  needs  to  have  access  to  the  online  database  that  stores 
current  information  about  the  software  system  being  altered. 

The  purpose  of  the  semantic  update  system's  report  generator/manager 
is  to  centralize  production  of  various  kinds  of  reports.  (See  discussion 
of  Class  10  commands  in  the  next  section.)  All  reports  that  will  be  seen 
by  the  user  are  handled  by  the  report  generator  function,  which  takes 
care  of  details  of  page  formatting,  line  editing,  and  related  activities. 

The  system  archiving  facility  provides  a capability  for  developing 
detailed  archives  of  system  production.  This  archive  would  be  in  addition 
to  that  ordinarily  generated  during  a series  of  update  cycles  between  the 
OLD  MASTER  and  the  NEW  MASTER  files.  The  information  that  would  be 
preserved  would  consist  essentially  of  a complete  history  of  all  actions 
taken  during  the  life  of  a system  that  is  managed  by  the  semantic  update 
facility.  In  addition  to  generating  the  file,  this  element  of  the  system 
also  can  produce  reports  that  display  the  entire  sequence  of  programmer 
actions. 

The  role  of  the  command/text  stream  processor  is  to  act  as  an 
intelligent  filter  on  all  user-specified  commands.  Statements  encountered 
on  the  command  stream  would  either  be  interpreted  directly  by  the  update 
system  command  processor,  or  would  be  recognized  by  the  FORTRAN  statement 
recognizer.  The  response  of  this  element  would  be  to  direct  processing 
to  a state  that  corresponds  to  one  of  the  forms  that  appears  as  a 
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command  class;  in  other  words,  the  exterior  syntax  scanning  for  semantic 
update  commands,  as  well  as  control  of  the  FORTRAN  statement  recognizer, 
is  vested  in  the  input  processor. 

Because  the  input  stream  may  involve  FORTRAN  statements,  and  because 
certain  other  semantic  update  actions  will  require  the  recognition  (but 
not  a full  parse),  there  must  be  a facility  for  type  classification  and 
lexical  analysis  of  FORTRAN  statements.  This  facility  is  closely  allied 
in  capability  with  the  side-effect  analyzer  (discussed  next).  The  input 
to  the  statement  recognizer  is  a FORTRAN  statement.  The  output 
would  typically  be  (a)  a partially  "cracked"  version  of  the  statement  in 
which  the  major  elements  of  the  text  stream  were  decomposed  at  least  in 
part,  and  (b)  the  value  of  the  statement  type.  Note  that  some  FORTRAN 
statements  involve  parts  of  more  than  one  "type"  so  the  "type"  that  is 
returned  may  have  more  than  one  component  value. 

The  purpose  of  the  side-effect  analyzer  is  to  determine  the  extent 
of  side  effects  that  a particular  semantic  update  command  would  have  under 
the  assumptions  that  (1)  the  command  is  the  only  one  being  processed  for 
the  given  module,  and  (2)  there  is  sufficient  information  about  the  module 
to  let  the  side-effect  analyzer  operate  normally.  The  information  that 
is  required  is  a function  of  the  complexity  of  the  module:  the  more 
complex  the  module  the  more  difficult  the  side-effect  analysis  is  going  to  be. 

In  general,  the  side-effect  analyzer  will  need  to  have  access  to 
rather  complete  structural  and  partial -semantic  information  about  the 
module  in  question.  This  requires  a level  of  analysis  somewhat  greater 
than  that  needed  simply  for  program  common  analysis  and/or  parameter 
analysis.  However,  the  information  does  not  approach  that  needed  by  a 
compiler.  It  is  important  to  note  that  the  amount  of  information  is  also 
a function  of  the  specific  update  command.  For  example,  if  the  update 
command  specified  the  alteration  of  a program  label  then  only  label 
information  would  be  necessary  to  decide  the  extent  of  side  effects.  For 
other  commands  different  information  would  be  needed.  It  would  seem 
reasonable  to  organize  the  side  effect  analyzer  so  that  it  generated  the 
information  it  needed  on  an  ad  hoc  basis,  developing  only  that  which  is 
necessary.  Such  an  approach  implies  that  the  information  would  be  volatile 
in  the  sense  it  would  not  be  saved  from  update  run  to  update  run  but  would 
be  regenerated  as  necessary. 

All  the  components  illustrated  in  Figure  7 and  discussed  in  the 
prior  paragraphs  operate  around  the  major  control  loop.  This  loop  is 
shown  in  the  flow  diagram  of  Figure  8.  Individual  commands  are  received 
and  analyzed,  and  included  FORTRAN  statements  to  statement  fragments  are 
recognized.  Next,  the  appropriate  statements  that  are  going  to  be 
analyzed  are  retrieved,  either  from  the  database  of  definitions  and  related 
information,  or  from  the  OLD  MASTER  copy  of  the  module  embryo,  and  then 
recognized  (if  necessary).  The  retrieval/recognition  process  produces 
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(1)  small  modifications  to  individual  statement/lines,  or 

(2)  a complete  replacement/deletion  of  a module  or  modules 

52 


Figure  8:  Basic  Control  Loop  for  Semantic  Update  System 


MASTER  will  contain  a fairly  recent  program  text  that  can  be  used  to 
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enough  information  to  permit  selection  of  the  appropriate  sub-process 
within  the  semantic  update  system;  after  the  sub-process  finishes 
executing  the  operations  required  based  on  the  user's  command,  a check 
is  made  to  determine  whether  there  are  remaining  commands.  This  cycle 
continues  until  all  of  the  user's  commands  have  been  processed.  Note 
that  this  may  or  may  not  require  accessing  the  complete  OLD  MASTER  file; 
only  those  modules  which  are  affected  need  be  treated  in  detail. 

4.4  Command  Classes 


Figure  9 gives  the  major  classes  of  commands,  and  points  out  the 
main  identifying  feature  of  each  class.  Some  of  the  classes  affect  the 
overall  behavior  of  the  semantic  update  process,  while  other  commands 
are  directly  related  to  system-level  structures  for  the  software  maintained 
on  the  OLD  MASTER.  The  overall  relationships  between  the  commands  on  a 
by-class  basis  is  depicted  in  Figure  10 

Class  1 commands  control  the  type  of  processing  to  be  done  by  the 
semantic  update  system,  and  include  such  data  as  the  following: 

• Identification  of  the  OLD  MASTER  FILE. 

• Identification  of  the  programmer  so  his  access  rights  can  be 

checked. 

• Control  what  output  is  created  by  the  system.  This  includes 

the  NEW  - MASTER  FILE,  compiler  output,  reports,  and  whether 

the  update  is  a temporary  or  permanent  one. 

• Establish  new  systems. 

Class  2 and  3 commands  control  modifications  made  to  modules  known 
within  the  semantic  update  system.  Class  2 commands  are  those  which 
affect  a single  module  and  which  have  side-effects  that  are  limited  to 
that  module  alone.  Class  3 commands  also  affect  a single  module,  but 
have  side  effects  that  extend  beyond  the  module  at  which  the  change 
originates. 

The  commands  in  Class  2 and  Class  3 involve  the  following  primitive 
kinds  of  actions: 

• Add  text  to  the  program. 

• Delete  text  from  the  program. 

• Change  the  program. 

In  some  cases  a "change"  is  the  equivalent  of  a combined  add-delete  pair, 
and  in  other  cases  a change  operation  is  more  complicated  than  the 
combination. 
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CLASS  NUMBER  DESCRIPTION 


NUMBER  OF 
COMMANDS 
IN  CLASS 


1 COMMANDS  TO  THE  SEMANTIC  UPDATE  SYSTEM 

2 COMMANDS  THAT  AFFECT  A SINGLE  MODULE  AND 

HAVE  NO  GLOBAL  SIDE  EFFECTS 

3 COMMANDS  THAT  AFFECT  A SINGLE  MODULE  AND 

HAVE  GLOBAL  SIDE  EFFECTS 


7 


25 


4 COMMANDS  THAT  AFFECT  PROGRAM  COMMON  DECLARATIONS  9 


5 COMMANDS  THAT  AFFECT  MODULE  DEFINITIONS, ' INCLUD- 

ING THE  NAME  AND  PARAMETER  LIST 


6 COMMANDS  THAT  AFFECT  GLOBAL  DEFINITIONS  (EXCLUDING 

PROGRAM  COMMON)  (GENERAL  MACRO) 


7 COMMANDS  THAT  AFFECT 

U COMMANDS  THAT  AFFECT 

9 COMMANDS  THAT  AFFECT 

10  COMMANDS  THAT  REPORT 


SYSTEM  STRUCTURE  7 

CONDITIONAL  PROCESSING  3 

SYSTEM  ARCHIVING  FEATURES  3 

SYSTEM  STATUS  10 


Figure  9:  Major  Classes  of  Semantic  Update  Commands 
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Class  4 commands  affect  program  COMMON  declarations  which  are  all 
centrally  coordinated  by  the  system  to  assure  that  they  are  identically 
the  same  in  every  module.  Commands  in  this  class  permit  a user  to: 

• Create/delete  common  blocks. 

• Rename  common  blocks. 

t Add/delete  variables  from  common  blocks. 

• Rearrange  the  order  of  the  variables  in  a common  block. 

• Move  a variable  from  one  common  block  to  another. 

• Include/delete  a common  block  from  a module. 

Class  5 commands  affect  the  "definitions"  of  modules.  Here,  a 
module's  definition  is  taken  to  mean  the  name  of  the  module  and  its 
list  of  calling  parameters.  Class  5 commands  allow  the  user  to: 

• Change  the  name  of  a module. 

• Add/delete  variables  from  the  list  of  calling  parameters. 

• Rearrange  the  order  of  the  parameters. 

The  Class  6 commands  provide  the  semantic  update  system  with  a 
generalized  macro  processing  capability.  Unlike  many  macro  generators  for 
programs,  the  one  included  in  the  semantic  update  system  operates  at  the 
source-program  level.  The  commands  in  Class  6 permit  the  user  to: 

• Define  a new  macro. 

• Alter  an  existing  macro. 

• Delete  a macro. 

For  macros  that  are  defined  by  Class  6 commands  the  user  only  need  reference 
them  elsewhere  in  the  software  system  in  order  to  take  advantage  of  this 
powerful  capability. 

Class  7 commands  define  logical  subsystems  or  parts  of  the  entire 
software  system  that  are  used  to  modify  the  action  of  certain  other 
commands.  Subsystem's  names  are  similar  to  modules  but  can  refer  to  a 
set  of  several  modules.  Class  7 commands  permit  a user  to: 

• Define/delete  and  sub-system. 

• Add/delete  a module  from  a sub-system. 

• Add  modules  to  a subsystem  until  it  contains  all  the 
modules  it  calls. 


would  make  it  possible  to  define  major  elements  of  a software  system  and 
base  subsequent  commands  on  such  definitions. 
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• Apply  subsequent  commands  to  a subsystem. 

The  Class  8 commands  provide  the  user  with  the  capability  to  control 
the  conditional  text  generation  features  of  the  semantic  update  system. 
Basically,  a user  has  parametric  control^  over  selection  of  source  text  at 
the  final  COMPILE-file  generation  point.  Based  on  the  current  values  of 
run-time  parameters  the  system  selects  previously  identified  statements 
for  inclusion  and/or  exclusion.  Class  8 commands  include  the  capability  to: 

• Set  the  actual  value  of  a generation-time  parameter. 

• Declare  a new  parameter 

• Delete  a previously  used  parameter 

Class  9 commands  deal  with  the  archiving  function  of  the  semantic 
update  facility.  The  archive  produced  contains  a complete  record  of  all 
transactions  made;  this  record  is  likely  to  be  more  comprehensive  than 
the  changes  that  would  reflect  through  an  OLD  MASTER/NEW  MASTER  cycle, 
where  the  primary  concern  is  generation  of  the  "new"  software  system 
representation.  The  user  has  the  capability,  with  Class  9 commands,  to 
indicate  the  beginning  and  end  of  epochs  in  the  history  archive,  as  well 
as  to  instruct  the  system  to  update  the  history  archive. 

Class  10  commands  specify  which  system-level  reports  are  to  be 
generated  during  the  current  semantic  update  run.  A variety  of  reports, 
intended  to  be  of  use  to  the  programmer,  can  be  generated.  Primarily 
these  involve  reference-format  listings  of  various  system  tables, 
definitions,  etc. 

4.5  A Sample  Command 

Figure  11  illustrates  a sample  command  in  the  Class  3 command  class 
which  would  be  used  to  increase  or  decrease  the  maximum  values  that  a 
variables  subscript  could  assume. 

5 DESIGN  CONSIDERATIONS 

This  section  presents  some  considerations  that  are  important  in  the 
design  of  the  semantic  update  facility.  The  considerations  relate  both 
to  features  of  FORTRAN  software  systems  that  can  be  exploited  effectively 
by  the  semantic  update  system,  and  to  a fairly  general  characterization 
of  the  FORTRAN  environment. 

It  is  important  to  appreciate  the  simplifications  that  can  result 
from  assuming  that  the  software  will  be  written  in  a specific  programming 
language.  While  it  would  be  inappropriate  to  argue  the  merits  and  short- 
comings of  FORTRAN,  it  certainly  can  be  considered  to  be  a language  that 

^3Kruskal,  V.J.,  "An  Editor  for  Parametric  Programs,"  IBM  Research  Report, 
RC-6070,  June  1976. 
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Figure  4:  Classes  of  Source  Text  Processors 
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SYNTAX: 

SIZE-SUBSCRIPT  VARIABLE  (SUBSCRIPT-LIST) 

GENERAL  EXPLANATION: 

INCREASE  OR  DECREASE  MAXIMUM  VALUES  SUBSCRIPTS 
ARE  ALLOWED  TO  TAKE. 

ERRORS  AND  WARNINGS: 

W1  NO  INSTANCE  OF  VARIABLE  FOUND 

W2  NO  SUBSYSTEM  NAMED  (ASSUMED  TO  APPLY  TO  THE  TOTAL  SYSTEM) 

W3  A PROGRAM  ERROR  MAY  HAVE  BEEN  CREATED 

El  THE  SUBSCRIPT  LIST  CONTAINS  THE  WORNG  NUMBER  OF 
ENTRIES 

SIDE  EFFECTS: 


A DECREASE  COULD  AFFECT  ALL  VALUES  WHERE  HIGHER  LIMITS 
ARE  USED.  THESE  ARE  FLAGGED  FOR  PROGRAMMER  ATTENTION. 

EXAMPLE  AND  EXPLANATION: 

SIZE  SUBSCRIPT  A (6,6) 

VARIABLE  A (3,12)  IS  CHANGED  TO  A (6,6)  IN  ALL  DECLARATION 
STATEMENTS.  STATEMENTS  SUCH  AS  A (2,9)  = 10  or  A (2,1)  = 

I + 1 ARE  FLAGGED. 


Figure  11:  A Sample  Command  with  SIDE  EFFECTS 
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is  widespread.  In  addition,  there  is  a substantial  wealth  of  applications 
software  already  coded  in  FORTRAN.  Hence,  the  basic  design  of  a semantic 
update  system  on  the  semantics  of  FORTRAN  may  be  important  for  practical 
reasons,  even  if  it  is  not  an  elegant  notion. 

One  of  the  areas  in  which  there  can  be  significant  savings  is  in  the 
realm  of  statement  recognition.  As  the  command  structures  have  suggested, 
it  will  ordinarily  be  necessary  to  know  something  about  the  statement 
that  is  submitted  to  the  system.  This  knowledge  would  vary  with  the  type 
of  statement:  some  statement  types  would  require  more  than  others.  For 
example,  if  an  insert  command  is  applied  to  an  assignment  statement  then  it 
is  likely  there  are  not  side  effects;  the  semantic  update  system  needs 
only  to  know  that  the  statement  is  an  assignment,  since  subsequently  it 
can  ignore  any  conflict  analysis. 

The  continued  references  made  to  commands  applying  to  statements 
deserves  some  comment.  Figure  12  illustrates  a hierarchy  for  FORTRAN 
software  systems  and  illustrates  an  important  point.  As  the  hierarchical 
structure  shows  there  is  a very  orderly  progression  from  "software  system" 
down  to  module,  but  sub-module  partitioning  yields  two  alternative  routes 
to  the  next-joined  primitive,  the  statement.  Furthermore,  the  relationship 
between  the  logical  entity  "statement"  and  its  physical  constituents  (i.e., 
the  real  world  implementations)  is  more  complex.  There  seems  to  be  no 
advantage  for  a semantic  update  system  to  operate  in  a disorderly 
structure;  hence,  as  shown  in  Figure  12,  semantic-based  operations  are 
restricted  to  apply  only  to  the  statement  level  and  above.  Making  this 
restriction  imposes  only  the  difficulty  of  dealing  with  blocks  vs.  segments. 
A segment  is  a logically  inseparable  sequence  of  statements  that  is 
always  executed  as  a unit;  a block  is  a structurally  contiguous  sequence 
of  statements  whose  execution  is  controlled  by  the  same  predicate.  (The 
interior  of  if .. .else. . .end  if  constructions  are  blocks.)  The  difficulty 
arises  because  segments  need  not  involve  statements  that  are  contiguous; 
for  this  reason,  semantic  update  commands  that  alter  control  flow  will  be 
particularly  difficult  to  deal  with  effectively,  since  the  system  would 
have  to  deal  with  unconnected  sequences  of  source  text. 

The  hierarchy  shown  in  Figure  12  is  also  of  importance  in  helping 
determine  the  extent  to  which  conflict  resolution  would  have  to  proceed. 

For  example,  we  have  already  seen  that  commands  that  change  the  names  of 
single  variables  within  a module  do  not  have  side-effects  elsewhere. 

However,  modifications  to  the  parameter  lists  of  one  module  can,  depending 
on  the  structure  of  the  software  system,  affect  a very  large  number  of 
other  modules.  The  hierarchy  will  help  determine  the  limits  to  such  side- 
effect  analyses. 

Because  the  semantic  update  system  provides  a different  view  of 
FORTRAN  programs  it  is  important  to  have  a framework  in  which  to  visualize 
the  effects  the  system  would  have  under  certain  programming  circumstances. 
This  framework  must  be  based  on  the  "typical"  structure  of  FORTRAN  programs. 
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Figure  12:  A Hierarchy  for  FORTRAN  Software  Systems 
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Figure  13  illustrates  some  of  the  attributes  of  a FORTRAN  module  in 
a generic  format  that  will  serve  as  the  needed  framework.  As  the  Figure 
indicates  it  is  possible  to  impose  a rough  classification  on  the  statements 
in  a FORTRAN  module.  Note  that  it  is  not  necessary  in  the  FORTRAN  language 
that  all  of  the  declaration  statements  be  in  the  particular  order  implied 
in  Figure  13;  the  ordering  presented  there  however,  does  represent  good 
programming  practice. 

The  FORTRAN  module  opens  with  a definition  of  its  name  and  formal 
parameters.  This  defines  the  kind  of  module  and  specifies  the  globally 
understood  information  about  it:  the  number  of  parameters  it  expects  when 
it  is  invoked.  Note  that  the  FORTRAN  language,  like  many  others,  can 
impose  no  specific  restrictions  on  the  correspondence  between  actual  and 
formal  parameters.  This  fact  obtains  primarily  because  the  compilers  for 
FORTRAN  are  incremental  in  nature;  that  is,  they  encounter  only  one  sub- 
routine or  function  at  a time  and  thus  cannot  tell  whether  a particular 
invocation  is  valid  or  not. 

Next  come  the  definitions  of  the  locally-understood  types,  dimensionality, 
and  extent  of  the  actual  parameters.  These  definitions  apply,  of  course, 
only  to  the  current  module.  It  is  important  to  note  that  the  actual 
definitions  made  cannot  be  in  conflict  with  either  a program  COMMON 
declaration  or  with  an  internal  function  definition. 

The  next  part  of  the  module  consists  of  references  to  one  or  more 
program  COMMON  statements,  thereby  establishing  intercommunication  between 
the  module  and  others  which  also  access  the  same  COMMON  statements. 

After  the  COMMON  statements  come  the  local  definitions/declarations  that 
apply  throughout  the  module.  As  for  formal  parameters,  such  declarations 
cannot  be  in  conflict  with  COMMON  variables. 

The  next  feature  is  a set  of  initializations  of  local  variables.  (It 
is  not  legal  to  initialize  either  a COMMON  variable,  except  in  a BLOCK 
DATA  program,  or  a formal  parameter.)  All  of  the  initializations  must 
preceed  the  first  executable  statement.  Normally  internal  function 
definitions  are  stated  just  prior  to  the  first  executable  statement.  The 
remaining  parts  of  the  FORTRAN  module  are  the  live  or  executable  statements; 
they  form  the  body  of  the  module. 

A final  design-related  note  concerns  the  logical  processing  structure 
of  the  semantic  update  system.  Because  it  is  likely  that  only  a very  small 
number  of  modules  will  be  altered  during  each  session,  or  within  each 
command  set,  it  will  be  important  to  design  the  semantic  update  so  that  the 
largest  file  is  accessed  only  once.  This  means  that  the  system  must  be 
built  so  that  It  makes  a single  pass  through  the  OLD  MASTER  file,  which  is 
likely  to  be  many  hundreds  of  times  larger  than  the  complete  command  file. 

This  means  that  all  of  the  processing  for  a single  module  must  be  completed 
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ooo  ooo  ooo  o o n o o o o o o o o o o o o 


SUBROUTINE  or  FUNCTION  name  (formal -parameter- 1 ist) 

THIS  STATEMENT  DEFINES  THE  MODULE  NAME  AND  CALLING  STRUCTURE 
declarations  regarding  the  formal  parameters 

(i)  name  of  formal  parameter 

(ii)  type  of  formal  parameter 

(iii)  extent  of  formal  parameter 

COMMON  /name/  1 i s t 

THIS  STATEMENT  DEFINES  THE  COMMON  BLOCK(S)  TO  BE  USED  BY  THE 
MODULE 

each  common  block  provid<  . name,  type,  and  extent  information 
for  a number  of  program  variables 

DECLARATI ONS  name(e x tent. } 

THIS  STATEMENT  DEFINES  LOCAL  VARIABLES  (NEITHER  COMMON  NOR 
FORMAL  PARAMETERS) 

each  local  name  is  typed,  and  has  extent  and  dimensionality 
information  as  well 

INITIALIZATION... 

THESE  DATA  STATEMENTS  PROVIDE  FOR  INITIALIZATION  OF  ALL  LOCAL 
VARIABLES 

INTERNAL  FUNCTION  DE F I N I T I ON ( sj . . . 

THIS  STATEMENT  PROVIDES  FOR  DEFINITION  OF  LOCAL  FUNCTIONS  (IF 
ANY  ARE  USED 

• • -BODY 

THIS  IS  THE  BODY  OF  THE  MODULE 


END 


Figure  13:  Typical  Contents  of  a FORTRAN  Module 
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for  each  module  in  turn  before  the  fully-generated  version  is  emitted  to 
the  COMPILE  file.  Consequently,  the  semantic  update  system  will  probably 
have  to  make  several  passes  over  each  module  as  it  is  presented  from  the 
OLD  MASTER  file  for  processing. 

6 DISCUSSION 

We  have  presented  the  flavor  of  the  semantic  update  system.  Complete 
documentation  appears  in  Miller  and  Praninskas  14,15, 

We  described  a system  being  developed  to  modify  complex  software  with 
ease  and  reliability.  It  is  designed  to  examine  a program  and  make 
modifications  to  it  based  on  instructions  it  receives.  It  reports 
extensively  on  what  ft  did  and  the  problems  and  conflicts  it  encountered. 

It  differs  from  the  traditional  update  system  in  its  knowledge  of  the 
programming  language,  in  this  case  FORTRAN,  and  its  ability  to  make 
multiple  and  varied  changes  to  the  software  based  on  a single  user  command. 
Traditional  updating  systems  do  not  perform  the  most  elementary  form  of 
checking,  not  even  to  see  if  the  line  being  replaced  is  exactly  the 
same  as  the  replacement  line. 

Cost  savings  using  semantic  updating  compared  with  conventional 
methods  are  expected  to  be  significant.  The  simplifications  arise  primarily 
through  the  automatic  identification  of  side-effects.  Because  maintenance 
represents  40-60%  of  the  life-cycle  cost,  a 10:1  reduction  in  maintenance 
translates  into  an  approximate  50%  savings  overall.  (There  is  reason  to 
believe  a 10:1  reduction  is  low  based  on  early  manual  estimates  of  the  number 
of  steps  saved  through  semantic  updating.)  An  inportant  aspect  of  future 
work  will  be  detailed  quantification  of  benefit/cost  ratios. 

The  semantic  update  system  is  one  of  a genre  of  program  understanding 
systems'”  which  has  evolved  from  ^ecent  ideas  in  artificial  intelligence. 

It  is  a realizable  project  for  program  modifications  and  software  repair. 

No  assertions  need  accompany  modification1  . Furthermore,  a semantic 
update  system  can  be  operational  within  a year  or  so  and  provide  insights 
for  some  of  the  systems  mentioned  which  may  take  a decade  to  develop  if 
indeed  they  are  even  undertaken. 


Miller,  E.F.Jr.,  and  Praninskas,  J.S.,  "Semantic  Update  Systems  - 
A Conceptual  Analysis,"  RP-104,  Software  Research  Associates,  San 
Francisco,  California,  October  1977. 

1 5 

Miller,  E.F.Jr.,  and  Praninskas,  J.S.,  "Semantic  Update  Systems  - 
A System  Specification,"  RP-105,  Software  Research  Associates,  San 
Francisco,  California  November  1977. 

^ Green,  C.C.,  " Progress  Report  on  Program  Understanding  Systems", 
Stanford  University,  AIM  240,  August  1974. 

^ Dershowitz,  N.  and  Manna,  Z.,  "The  Evolution  of  Programs:  A System 
for  Automatic  Program  Modification",  Stanford  University,  Artificial 
Intelligence  Laboratory,  AIM-294,  December  1976. 
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SOFTWARE  RESTYLING  IN  GRAPHICS  AND  PROGRAMING  LANGUAGES 


Eric  Grosse 

Computer  Science  Cepartment 
Stanford  University 
Stanford  CA  94305 


ABSTRACT.  The  value  of  large  software  products  can  be  cheaply 
increased  by  adding  restyled  interfaces  that  attract  new  users. 
As  examples  of  this  approach,  a set  of  graphics  primitives  and  a 
language  precompiler  for  scientific  computation  are  described. 
These  two  systems  include  a general  user-defined  coordinate 
system  instead  of  numerous  system  settings,  indention  to  specify 
block  structure,  a modified  indexing  convention  for  array 
parameters,  a syntax  for  n-and-a-half-times-'round  loops,  and 
engineering  format  for  real  constants:  most  of  all,  they  strive 
to  be  as  small  as  possible. 


0.0  PHILOSOPHY.  Kernighan  and  Plauger  [ 1976  ] describe 
explicitly  and  by  example  three  precepts  of  the  Software  Tools 
philosc  phy : 

trim  out  the  inessentials 
build  it  adaptively 
- let  someone  else  do  the  bard  part 
Two  more  examples,  driven  by  the  same  philosophy,  are  given 
below.  The  basic  idea  is  to  obtain  high  leverage  by  taking  an 
existing,  powerful  piece  of  software  and  make  it  useful  to  more 
people  by  designing  a new  interface.  Webster's  calls  this 
process  facelifting:  "a  restyling  intended  to  increase  comfort 
or  salability." 


1.0  JUSTIFICATION  FOB  STILL  ANOTHER  PROGRAMING  LANGUAGE. 
Fortran  will  no  doubt  remain  for  many  years  the  most  important 
programming  language  for  scientific  computation.  When  used 
carefully  and  with  discipline,  it  yields  remarkably  portable 
codes;  this  is  its  greatest  virtue.  But,  as  programmers  have 
complained  for  years,  it  also  has  many  faults: 

- awkward  syntax  for  statements,  strings,  names 

- primitive  control  structures 

- DO  loop  restrictions 

- no  macros 

Fortran  preprocessors,  such  as  MORTRAN  [Cook*Shustek  19751,  have 
eliminated  many  or  these  disadvantages  and  therefore  have  become 

very  popular.  Unfortunately,  they  reduce  portability  somewhat, 
since  either  the  preprocessor  most  be  installed  at  the  new  site 
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or  illegible  'object'  Fortran  sent  there.  Here  importantly,  such 
preprocessors  have  only  a ainor  effect  on  inherent  problems  of 
Fortran: 

- dynaaic  allocation  is  either  unavailable  or  requires  the  use 
of  rather  confusing  tricks 

- no  PROCEDURE  VARIABLE  type 

- no  STRUCTURE  type 

(Labelled  common  blocks,  since  they  do  not  use  the 
ccabina torial  possibilities  of  procedure  parameterization, 
are  less  flexible.) 

- no  0-origin  indexing 

- array  bound  inforaation  is  not  autorat ically  passed 

- no  vector  operations 

- no  recursion 

The  PORT  library  makes  such  heavy  use  of  dynamic  allocation  that 
it  has  become  cne  of  the  most  advertised  features:  "We  have 
found  that  use  of  dynamic  storage  allocation  in  PORT  leads  to 
more  clearly  structured  programs,  cleaner  calling  sequences, 
improved  memory  utilization,  and  tetter  error  detection." 

[ Fox+Hall+Schryer  1977]  Adding  a stack  to  Fortran  is  a messy 
affair,  however,  as  shown  in  figure  1,  which  contains  two 
alternate  methods  in  PORT  for  allocating  an  INTEGER  and  REAL 
array. 


SUBROUTINE  LBB(A,N) 

C 

COMMON  /CSTAK/DSTAK(500) 

C 

DOUBLE  PRECISION  DSTAK 
INTEGER  ISTAKOOOO) 

REAL  A(l) 

REAL  RSTAK(IOOO) 

C 

EQUIVALENCE  (DSTAK(l)JSTAK(l)) 
EQUIVALENCE  (DSTAK(l).RSTAK(D) 

C 

II  = ISTKGT(2*N,2) 

|R  - ISTKGT(N,3) 

{ code  referring  lo  RSTAK(IR+n)  and  lSTAK(ll  + m) 
probably  ending  with  code  to  store  the  stuff 
from  the  real  scratch  storage  into  array  A ) 


CALL  ISTKRL(2) 
C 


RETURN 

END 
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SUBROUTINE  LBB(A.N) 


COMMON  /CSTAK/DSTAK(500) 

C 

DOUBLE  PRECISION  DSTAK 
INTEGER  ISTACK(IOOO) 

REAL  A ( l ) 

REAL  RSTAK(IOOO) 

C 

EQUIVALENCE  (DSTAK. (l).ISTAK(t)) 
EQUIVALENCE  (DSTAK  (l).RSTAK(l)) 
C 

II  = ISTKGT(2*N,2) 

1R  - 1STKGT(N,3) 

C 

CALL  LI BB(A,1STAK(II),RSTAK(1R),N) 
C 

CALL  1STKRL.(2) 

RETURN 

END 


figure  1 

Other  proposals  are  even  more  complicated.  (After  a 7 page 
description  of  DYNOSOF,  Huybrechtsf  1977  ] states:  "This  paper 
gives  only  the  tasic  features  of  the  CYNOSOR  system.  A more 
sophisticated  use  allows  the  user,  once  he  is  familiarized  with 
the  system,  to  improve  greatly  the  speed  of  programs  using  it." 

PL/I,  which  is  now  becoming  fairly  widely  available  in  some  form, 
overcomes  all  these  difficulties.  However,  so  huge  a language 
tends  to  overwhelm  people,  and  because  of  tricky  precision  rules, 
silent  type  conversions  (as  in  I*J=0L;)  , and  the  like,  learning 
only  part  cf  the  language  is  dangerous. 

Other  languages,  while  beautifully  designed,  have  their  own 
flaws.  For  example,  Algol  N does  not  have  a robust  interface  to 
Fortran;  in  addition  to  this  [Kohilner  1977],  Pascal  places 
painful  restrictions  on  arrays. 


1.1  T.  Thus  another  approach  seems  warranted,  which  can  combine 
the  needed  features  of  PL/I,  the  deliberate  syntax  of  ALGOL,  and 
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the  low  implementation  cost  of  the  Fortran  preprocessors.  Such 
an  approach  has  produced  the  language  T,  intended  to  assist  in 
the  implementation  and  documentation  of  algorithms  for  scientific 
computation.  The  principal  aims  have  been  ease  of  reading  and 
writing,  low  implementation  cost,  and  reasonable  efficiency. 

Appendix  T gives  the  formal  language  proposal,  where  syntax  is 
specified  using  wirth's  proposal  [1977],  Since  T is  similar  to 
Fortran,  Algol  60,  and  PL/I,  a complete  specification  of  the 
semantics  may  be  emitted  without  confusion.  To  provide  the 
heuristics  behind  the  design  choices  and  to  give  an  overview  of 
the  language,  various  aspects  of  the  folcwing  example  will  be 
discussed. 


TRIPEAK 

* example  of  T and  G systems; 

# various  views  of  the  sum  of  three  Gaussian  peaks; 
I Eric  Grosse  Stanford  University 


SEAL:  AZIM,  ELEV, 

R EL  ERR  , AESERR 
T,  TOUT, 

NCRNYP 
SEAL  (2)  : 


VIEWING  ANGLES  FCR  SURFACE  PLOT 
ERROR  TOLERANCES  FOR  OPE 
INDEPENEENT  VARIABLES  CF  TRAJECTORY 
2 NORM  CF  THE  GR  A El ENT 

CORNERS  OF  RECTANGULAR  DOMAIN  OF  FUNCTION 
FOCAL  FCINT  FOR  SURFACE  PLOT 
COORDINATE  TRANSFORMATION  PARAMETERS 
IOCATIOK  ANE  GRAE1FNT  FOR  TRAJECTORY 


# density 
TABLE 

# CONTOUR 


of  F samples; 


LEVELS 


LL,  UR, 

ORIGIN , 

X0,  SCALE, 

Y,  YP 

REAL  (142)  : CDEWORK 
INTEGER (5) : ODEIWORK 
DEFINE  (P,  20) 

REAL  (-P:  P ,-P:  F)  : 

REAL  (3)  : LEVEL 
INTEGER:  I,  J, 

IFLAG 

STRUCTURE:  FARAH 
REAL  (3,2)  : X 
REAL  (3) : H,  W 
STRUCTU8E:  PF 

INTEGER  (500)  : WORK 

PROCEDURE:  GOPEN,  GCLOSE,  GPICT,  GCONT , GSURF,  GLTYPE, 
GJUMP,  GDRAW,  GTRAN1 
FORTRAN  PROCEDURE:  ODE,  DF, 

PROCEDURE  ()  REAL:  F 


DIAGNOSTICS  FLAG  FOR  ODE 

LOCATIONS,  HEIGHTS,  AND  WIDTHS  OF  PEAKS 


• PLOT  FILE 


STASH 


# SET  UP  PARAMETERS 
BLANK  SEPARATION  (2) 
REAL  DIGITS  (3) 

GET  DATA  ( AZI  M ,F,LEV) 

PUT  LATA (AZIK, ELEV) 

x (1. D : = 0 

X (1,2)  :=  0.5 

X (2,  1)  :=  -0.43301  2702 
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X (2,2)  :=  -0.25 
X (3,  1)  -X  (2, 1) 

X (3,2)  :=  X (2,2) 

PUT  DATA  ARRAY  (X) 

GET  ARRAY  (H) 

PUT  DATA  ARRAY  (H) 

GET  ARPAY  (H) 

PUT  DATA  ARRAY  (W) 

STASH  (X  , H , W ) 

FOR  ( -P  <*  I <=  P ) 

Y ( 1)  :=  F LO  AT  (I)  / F 

FOR  ( -P  <=  J <=  P ) 

Y (2)  : = FLOAT  (J)  / P 
F TABLE  (I  , J)  :=  F (Y,  PAR  A F) 

• SURFACE  PLOT 
GOEEN  (' VEP12FF'  ,PF) 

GPICT  (PF) 

LL  :=  -1 
UB  :=  1 
ORIGIN  :=  0.5 

GSURF  (LL,UR, FT ABLE, AZIK,ELEV, ORIGIN, 0.2  5 ,PF) 


* CONTOUR  PLOT 
GPICT  (PF) 

SCALE  :=  0.3333 
XO  :=  -0.  5/SCALE  ( 1) 

GTRAN1  (XO  , SCALE, PF) 

GET  AFRAY  (LEVEL) 

PUT  DATA  ARRAY  (LEVEL) 

GCCNI  'LL, UB, FTABLE, LEVEL, PF) 

GLT YPE ( ’ DOT ’ ,PF) 

GET  ARRAY  (LEVEL) 

PUT  DATA  ARRAY  (LEVEL) 

GCONT  (LL, UR, FTABLE, LEVEL, PF) 

t COMPUTE  AND  FLOT  TRAJECTORY 
RELERR  :=  10  (-6) 

GLTYPE ('SOLID* , PF) 

ABSERR  :=  10  (-6) 

WHILE  ( - END  OF  INPUT  ) 

GET  ARRAY  ( Y ) 

FUT  DATA  ARRAY  ( Y ) 

T :=  0 
GJUMP  (Y,FF) 

I FLA  G :=  1 

WHILE  ( NORBYP  > 1 (- 3)  6 1 <= IPL  AG  £ IFLAG<=3  ) 

TOUT  : = T ♦ 10  (-3)/NORMYP 

ODE (D F, 2, I, T, TOUT, RELERR, ABSEPR,IFLAG, ODE WORK, ODE IWORK) 
CASE 

2 = IFLAG 

GDRAW  ( Y , PP) 

3 = IFLAG 

PUT  ('ODE  DECIDED  EFROR  TOLERANCES  WERE  TOO  SHALL.*) 

PUT  ('NEW  VALUES:  •) 
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PUT  DATA (RELERR, ABSERB) 

ELSE 

PUT  ('ODE  RETURNED  THE  ERROR  FLAG:') 
PUT  DATA (IFLAG) 

FIRST 
DF  (T,Y, YP) 

NORHYP  :=  NORK2(YP) 

GCLOSE  (PF) 


F ( Y,  FARAM  ) Z 
REAL():  Y 
REAL:  Z,  NORHSQ 
STRUCTURE:  PARAH 
REAL  (3,2)  : X 
S EAL  ( 3) : H,  W 
INTEGER:  I 
Z :=  0 

FOB  ( 1 <=  I <=  3 ) 

NORKSQ  :=  (Y  (1) -X  (I,  1)  ) **2  ♦ ( Y (2)  - X (1 , 2)  ) **2 
Z :=  Z ♦ H (I)  *EXP  (-0. 5*W  (I)  ♦NORHSQ) 


1.2  CONTROL  AND  OTHER  SYNTAX.  Perhaps  the  most  striking  feature 
the  Algol  veteran  sees  in  this  example  is  the  complete  absence  of 
BEGINS  and  ENDs.  Not  only  is  the  text  indented,  but  the 
indention  actually  specifies  the  block  structure  of  the  program. 
Such  a scheme  was  apparently  first  proposed  by  Landin  [1966]. 
Except  for  an  endorsement  by  Knutb  [1974],  the  idea  seems  to  have 
teen  largely  ignored.. 

Ideally,  the  text  editor  would  recognize  tree-structured 
programs.  [Hansen  1971]  In  practice,  text  editors  tend  to  be  line 
oriented  so  that  moving  lines  about  in  an  indented  program 
requires  cumbersome  manipulation  cf  leading  blanks.  Therefore 
the  current  implementation  of  T uses  BEGIN  and  END  lines, 
translating  to  indention  on  output.  Thus  the  input 
STRUCTURE:  PARAH 
( ( 

REAL  (3,2)  : X 
REAL  (3)  : H,  W 
) ) 

produces  the  output 

STRUCTURE:  PARAH 
REAL  (3, 2)  : X 
REAL  (3)  : H,  W 

Whatever  the  implementation,  the  key  idea  is  to  force  the  block 
structure  and  the  indention  to  be  automatically  the  same,  and  to 
reduce  clutter  from  redundant  keywords. 

Blanks  are  insignificant  outside  of  strings.  Hathematical  tables 
have  long  used  clanks  inside  numeric  constants,  as  in 
PI  :=  3.  14  159  26535  £9793 
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for  readability.  Blanks  in  identifiers  also  can  inprove 
readability,  while  reducing  the  chance  of  misspelling  and  easing 
the  pain  cf  naire  length  restrictions  imposed  by  the  local 
operating  system. 

In  accordance  with  the  recommendations  of  Scowen ♦Wichmann  [1973], 
comments  start  with  a special  character,  #,  and  run  to  the  end  of 
the  physical  line. 

The  small  reserved  word  list  eliminates  the  need  for  a stropping 
convention.  The  psychological  advantages  cf  this  approach  have 
been  elaborated  by  Hansen  [ 1973  ], 

The  form  of  the  assignment  and  procedure  call  statements  follows 
the  clean,  clear  style  of  Algol  60.  To  make  macros  more 
understandable,  their  syntax  and  semantics  match  those  of 
procedures  as  closely  as  possible. 

In  addition  to  normal  statement  sequencing  and  procedure  calls, 
three  control  structures  are  provided.  The  CASE  and  WHILE 
statements  are  illustrated  in  this  typical  program  segment: 

WHILE  ( NJRMYP  > 1 (- 3 ) C 1<=IPL A G 6 IFLAG<=3  ) 

TOOT  :=  T ♦ 10  (-3)/NOFMYP 

ODE (DP, 2, Y ,T,TOUT,RELERR, ABSERF, I FL AG , ODEWO RK , 0 DE I WOR K) 

CASE 

2 = IFLAG 

GTRAW  ( Y , PF) 

3 = IFLAG 

PUT  ( ' ODE  DECIDED  ERROR  TOLERANCES  WERE  TOO  SMALL.') 

POT ('NEW  VALUES:') 

POT  DATA  (RELERR, ABSEPP) 

ELSE 

P 0T ( ' ODE  RETURNED  THE  ERROR  FLAG:') 

PUT  DATA  (IFLAG) 

FIRST 
DF  (T,  Y,  YP) 

NORMYP  :=  NOBM2  (YF) 

The  CASE  statement  is  modelled  after  the  conditional  expression 
of  LISP;  the  boolean  expressions  are  evaluated  m sequence  until 

one  evaluates  to  YES,  or  until  ELSE  is  encountered.  The  use  of 
indention  makes  it  easy  to  visually  find  the  relevant  boolean 
expression  and  the  end  of  the  statement. 

One  unusual  feature  of  the  WHILE  loops  is  the  optional  FIRST 
marker,  which  specifies  where  the  loop  is  to  be  entered.  In  the 

example  above,  the  norm  of  the  gradient,  NOFMYP,  is  computed 
before  the  locp  test  is  evaluated.  Thus  the  loop  condition, 
which  often  provides  a valuable  hint  about  the  loop  invariant, 
appears  prominently  at  the  top  of  the  lccp,  and  yet  the  common  n- 
and-a  half-times- ' round  loop  can  still  be  easily  expressed. 


85 


The  FOB  statement  adheres  as  closely  as  practical  to  common 
mathematical  practice. 

FOR  { 1 <*  I <=  3 ) 

NOFKSQ  :=  (Y  ( 1)  - X (I,  1)  ) **2  ♦ (Y  (2) -X  (1,2)  ) **2 
Z : = Z ♦ H (I)  *EXP  (-0. 5*H  (I)  «N0RMSC) 

Several  years  experience  with  these  control  constructs  has 
demonstrated  them  to  be  adequately  efficient  and  much  easier  to 

maintain  than  the  alternatives. 

Procedure  nesting  is  not  used  for  two  reasons.  First,  textual 
nesting  that  extends  over  many  pages  is  difficult  for  a human  to 

keep  track  of.  Second,  programs  typically  contain  several  high 
level  procedures  calling  a single  primitive,  so  a tree 
representation  is  inappropriate  anyway. 

Ey  removing  the  nesting  of  procedures,  however,  we  worsen  the 
problem  cf  entry  point  hiding  that  arises  when  combining  programs 

from  many  sources  into  a single  library.  A solution  to  this 
problem  is  to  have  an  official  name  for  each  procedure,  coded 
along  the  lines  of  IHSL,  and  also  a mere  mnemonic  nick  name 
(which  users  can  pick  for  themselves  if  they  like).  The  macro 
processor  which  is  built  into  T can  then  be  used  to  change  all 
occurences  of  the  nick  names  into  the  corresponding  official 
names . 


1.3  DECLARATIONS.  The  fundamental  scalar  types  are  INTEGER, 
PEAL,  and  COMPLEX,  from  which  arrays  and  structures  may  be  built 
up.  As  the  example 

BEAL  (-  P: P , -P  : P) 

illustrates,  general  upper  and  lower  bounds  are  allowed. 

The  upper  bound  expression  is  omitted  for  a formal  array 
parameter,  sc  that  an  appropriate  value  can  be  taken  from  the 
length  of  the  corresponding  actual  array  argument.  The  origin  of 
an  actual  array  argument  need  not  match  the  origin  of  the 
corresponding  formal  array  parameter.  For  example,  if  the  actual 
argument  A was  declared  REAL(0:7):  A and  the  formal  parameter  B 
was  declared  RE AL  () : B,  then  B(8)  will  correspond  to  A (7)  . Most 
languages,  when  they  allow  lower  bounds  at  all,  do  not  permit 
this  flexibility,  which  is  used  in  the  example  program  when  a 
matrix  with  lower  bound  -P  is  passed  to  a general  purpose  library 
routine  which  assumes  a lower  bcurd  cf  0. 

Structures  of  arbitrary  depth  may  be  declared.  As  the  examples 
STRUCTURE:  PAFAM 

REAL  (3, 2)  : X 
REAL  ( 3 ) : H,  W 
STRUCTURE:  PF 

INTEGER  (500)  : WORK 
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suggest,  structures  are  useful  passing  collections  of  related 
data,  without  the  need  for  long  parameter  lists.  This  cakes 
feasible  the  prohibition  of  global  variables  in  a drastic  attempt 
to  narrow  and  make  more  explicit  the  interface  between 
procedures.  Euclid  [ Popek+others  1977]  has  emphasized  the 
importance  of  visibility  of  names. 

The  graphics  procedures  which  use  the  WORK  vector  of  the  example 
are  able  to  divide  up  the  space  into  convenient  units.  This 
capability,  which  would  be  possible  in  PL/I  only  through  the  use 
of  pointers,  encourages  information  hiding  and  abstraction. 

PROCEDURE  VARIABLES  allow  the  names  of  procedures  to  be  saved,  an 
essential  feature  for  applications  like  the  user-specif ied 
coordinate  transformation  described  in  the  graphics  system  below. 

The  importance  of  existing  Fortran  software  is  recognized  by 
providing  for  FORTRAN  PROCEDURES  as  an  integral  part  of  the 
language.  The  current  implementation  of  T performs  this  linkage 
in  a more  efficient  way  than  the  naive  user  of  PL/I  would  be 
likely  to  discover. 

A novel  syntax  is  introduced  for  function  returns.  Since 
procedures  nay  te  recursive,  Fortran's  convention  of  using  the 
functicn  name  as  variable  cannot  te  followed.  Instead,  the 
procedure  header  declares  a return  variable  just  like  any  other 
parameter: 

F ( Y,  PARAH  ) Z 
BEAL  ()  : Y 
REAL:  Z 


1.4  INPUT/OUTPUT.  Beginners  often  find  Fortran's  input/output 
the  most  difficult  part  of  the  language,  and  even  seasoned 
programmers  are  tempted  to  just  print  unlafcelled  numbers,  often 
to  more  digits  than  justified  by  the  problem,  because  formatting 
is  so  tedious.  PL/I*s  list  and  data  directed  I/O  is  so  much 
easier  to  use  that  it  was  wholeheartedly  adcpted  in  T.  By 
providing  procedures  for  modifying  the  number  of  decimal  places 
and  the  number  of  separating  blanks  to  be  output,  no  edit-drected 
I/O  is  needed.  Special  statements  are  provided  for  array  I/O  sc 
that,  unlike  PL/I,  arrays  can  be  printed  in  orderly  fashion 
without  explicit  formatting. 


Since  almost  as  much  time  is  spent  in  scientific  computation, 
staring  at  pages  or  numbers  as  at  pages  of  program  text,  much 

thought  was  given  to  the  best  format  for  displaying  numbers. 


In  accordance  with  the  "e 
Packard  calculators  and  w 


ngineering  f 
ith  standard 


ormat"  used  on  Hewlett- 
metric  practice  [GH  service 


Section  1977],  exponents  are  forced  to  be  multiples  of  3.  As 
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figure  2,  an  excerpt  froa  the  exaaple  crcqraa's  output,  shows, 
this  convention  has  a hi stograaa Inq  effect  that  concentrates  the 

infornation  in  the  leadinq  digit,  as  opposed  tc  splitting  it 
between  the  leading  digit  and  the  exponent,  which  are  often 
separated  by  14  columns.  The  use  of  parentheses  to  surround  the 
exponent,  like  the  legality  of  iatedded  blanks,  was  suggested  by 
aatheaatical  tables.  This  notation  separates  the  exponent  fron 
tho  aantissa  sore  distinctly  than  the  usual  P format. 
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DISCUSSION. 


figure  2 


Poliowing  Kernighanerlauoer  [1976],  the  initial  iapleaentation  ii 
unsophisticated  [Cower  1976  ] . Nevertheless,  the  preprocessing  ii 
less  costly  than  the  Pl/I  cospile,  so  the  overall  results  are 
quite  satisfactory.  (The  evaluation  looks  even  better  if  one 
coapares  PL/I  ♦ T against  fL/I  ♦ Il/I'a  sacro  preprocessor.) 

Host  cf  the  processor  cost  lies  in  basic  I/O;  by  integrating  the 


■aero  processor  with  the  language  translator,  this  cost  has  been 
minimized.  (Kantcrowitz  1976]  Much  of  the  two- aan- months  spent 
in  implementation  were  spent  in  understanding  nooks  and  crannies 
of  PL/I. 

T is  not  intended  to  replace  any  existing  languages.  Por 
distributing  mathematical  software,  Portran  remains  the  only 
practical  medium;  for  character  processing,  something  like  PL/I 
or  SNOBOL  should  be  used.  Still,  for  the  bulk  of  scientific 
computation,  T ought  to  be  the  easiest  to  use,  particularly  since 
it  coexists  comfortably  with  Portran  and  PI/I.  On  the  other 
hand,  one  can  imagine  ways  that  T might  be  improved,  as  well. 

Features  omitted  for  ease  of  implementation  include: 

- trimmed  arrays,  like  X(2:N) 

- procedure  results  of  general  type 

- conditional  boolean  operators  that  dc  not  evaluate  their 

arguments  when  it  is  possible  to  avoid  doing  so 

- a swap  operator 

For  other  features,  no  entirely  satisfying  design  was  apparent: 

- strings 

- mere  general  procedure  calls  (such  as  indefinite  number  and 

type  of  arguments) 

- a means  of  constructing  arrays  directly  from  components,  as 

a string  constant  constructs  a string  from  individual  chaiacters 

- a means  of  specifying  the  invocation  graph  of  who  calls  whom 

Perhaps  the  most  fundamental  though  unavoidable  flaw  is  that, 
unlike  LISP,  the  language  is  not  trivial,  and  therefore  programs 
cannot  be  trivially  manipulated. 


2.0  JUSTIFICATION  FOR  STILL  ANOTFER  SET  OF  GFAFHICS  PRIMITIVES. 
The  next  example  of  restyling  is  a simple  but  reasonably  complete 
interface  for  noninteractive  device-independent  graphics.  In 
addition  to  the  basic  line  drawing  primitives,  higher  level 
procedures  are  provided  for  displaying  functions  of  one  or  two 
variables.  This  interface  has  been  implemented  as  a library  of 
PL/I  procedures  which  call  the  SLAC  Unified  Graphics  package 
written  by  Robert  Beach  f 1 S 78  ] 


Unified  Graphics,  with  its  emphasis  cn  the  ability  to  drive, 
displays  lime  the  IBM  2250,  Is  troublesome  to  use  directly  for 

function  plots  and  the  like.  In  contrast.  Top  Drawer,  another 
graphics  system  at  SI.AC,  allows  fer  function  plots  but  little 
else.  The  collection  described  in  detail  in  Appendix  G is  meant 
to  strike  a useful  balance  between  these  two  extremes,  and 
contains  most  of  the  features  of  PISSPLA  important  for  scientific 
computation. 
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2.1  ES1ABLI SHING  THE  ENVIRONMENT.  The  f ol lcwing  excerpt  fro* 
the  exaaple  prograa  given  in  section  1.  1 above  illustrates 

typical  preparation  for  plotting: 

STRUCTURE:  PF  • FLOT  PI1E 

INTEGER  (500) : WORK 

REAL  (2):  IL , UR,  • CORNERS  OF  RECTANGULAR  DOMAIN 

ORIGIN,  • FOCAL  PCINT  FOR  SURFACE  PLOT 

XO,  SCALE  • COORDINATE  TRANSFORMATION  PARAMETERS 
GO  PEN  (•VEP12FF',PF) 

GPICT  (PF) 

SCALE  :«  0.3333 
XO  :*  -0.  5/SCALE(1) 

GTFAN1  (XO, SCALE, PF) 

The  plot  area  PF  is  used  to  reneaber  various  options  and  to 
buffer  low  level  plotter  instructions.  This  work  area  is 
initialized  by  the  GOPEN  call,  which  specifies  the  output  device. 
(In  the  current  irplenentation,  nc  corresponding  JCL  changes  are 
necessary.)  The  ease  with  which  devices  xay  be  changed  is  very 
useful  in  tuning  a plot  for  publication. 

For  coapatibility  with  nuaerical  procedures,  REAL  variables  are 
in  full  precision,  not  short.  At  the  start  of  each  new  picture, 
which  might  be  a screenful  on  a CRT  or  an  6.5  by  11"  page  on  an 
electrostatic  plotter,  GPICT  is  called. 

All  plotting  is  dene  relative  to  a user  coordinate  systea,  which 
is  specified  by  calling 
GTRAN ( F,  PF  ) 

where  F is  the  naae  of  a procedure  which,  when  called  in  the  fora 
F(  X,  N,  PF  ) 
with 

BEAL  (N)  : X N< * 10 

REAL  (2)  : W 

will  aap  the  point  X in  user  coordinates  into  a point  M in  the 
unit  square  [0,1]x[0,1].  Noraally  W ( 1)  is  thought  of  as 
horizontal  and  W(2)  as  vertical.  By  extending  PF,  the  user  can 
pass  paraaeters  to  F.  For  convenience,  the  default 
transf oraation  aaps 

W :»  SCALE  • ( X - X0  ) 


2.2  DRAWING,  DIMENSIONING,  AND  FUNCTION  GRAPHING.  The  basic 
drawing  coaaands  are  GJUHP,  GDRAW,  and  GTEXT  for  drawing  lines 
and  adding  text.  If  a nonlinear  coordinate  systea  has  been 
specified,  GCRAH  produces  a piecewise  linear  approxiaation  to  the 
iaplied  curve. 

A procedure  GGRAPH  is  provided  which  autcaa tically  samples 
functicn  values,  sets  up  an  appropriate  scaling,  graphs  the 
function,  and  diaensions  the  graph  using  round  nuabers  in  a style 
consistent  with  the  foraat  used  by  T.  Figure  3,  taken  froa  Chan 
[ 1976  ],  is  a typical  plot. 
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The  scheme  for  choosing  roond  milters  is  based  on  the  algorithm 
by  Dixcn*Kronaal  [ 1965  j.  Ixperience  and  an  informal  survey  of 
what  people  would  accept  as  being  "round  nuabers"  led  to  various 
ref ineaents.  As  in  Unified  Graphics,  the  chcice  Is  optiaized 
over  a reasonable  nunber  of  najor  tick  aarks.  The  total  nuaber 
of  tic  aarks,  aajor  and  minor,  is  not  allowed  to  be  either  too 
dense  or  too  sparse.  For  a while,  the  cuater  of  ainor  tick  aarks 
was  chcsen  so  that  each  interval  had  length  I0**k,  but  for  input 
data  Units  (20,70)  the  resulting  tick  aarks  were  at 
(-100,0,100,200),  so  this  rule  had  to  be  relaxed  to  "either 
length  10**k  or  midpoint  of  aajor  interval."  If  the  difference 
between  the  data  linits  is  snail  coapared  to  the  aagnitude  of  the 
Units  thenselves  (as  occurs  for  example  in  plotting  a nearly 
constant  function),  then  the  labels  may  become  unreasonably 


large.  Special  prevision  is  nade  for  this  case. 

Other  routines  are  available  for  scatter,  surface,  and  contour 
plots.  The  contour  computation  uses  piecewise  quadratic  surface 
fitting  to  ensure  smooth  contours  and  proper  representation  of 
critical  points.  [ Karlow*Fowell  1976]  Figure  4 presents  output 
froa  the  exaaple  program,  which  cenputes  hill-climbing 
trajectories  for  a three-gaussian-pea k terrain. 


figure  4 


CONCLUSION.  With  a level  of  effort  coaparable  to  writing  a 
Fortran  preprocessor,  ve  have  created,  by  compiling  Into  PL/I,  a 
language  substantially  better  than  Fortran  or  its  derivatives. 
Since  PL/I  ptobleas  cannot  be  altogether  avoided  by  this 
approach,  further  work  on  a language  like  1 could  be  useful. 
Perhaps  the  effort  would  be  better  spent  on  aaking  LISP  a 
practical  language  for  scientific  coapuation  by  building  on  the 
research  in  aysbolic  coaputation. 


Like  PL/I,  Unified  Graphics  is  good  for  a, wide  range  of.  . 
applications.  But  in  practice,  sany  people  won't  use  either. 

For  languages,  they  stick  to  Fortran;  for  graphics,  they  plot  by 
hand  or  not  at  all.  In  both  cases  it  has  proven  possible  to 
cheaply  restyle  the  existing  systca,  via  a preprocessing  phase  or 
driver  routines,  in  order  to  create  aore  agreeable  tools. 
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APPENDIX  T 


Deport  on  the  programing  language  1 


"triii:  free  fron  anything  eitraneous; 

having  clean  lines  or  proper  proportion; 
the  state  of  readiness  for  action  or  use;" 

Webster's  Third  New  International 

"Everything  should  be  as  siirple  as  possible, 
but  no  simpler." 

Einstein 

"what  it  lies  in  our  power  to  do, 

it  lies  in  our  power  not  to  do." 

Aristotle 


"In  all  spheres,  the  true  craftsian  is 

the  one  who  thoroughly  understands  his  tools." 
Hoare 


"let  someone  else  do  the  hard  part." 

Kernighan  ♦ Plaoger 


TOKENS.  Frogran  text  is  made  up  cf  the  following  tokens; 


keyword  - one  of  the  followitg: 
CASE 
COMPLEX 
ELSE 
FIRST 
FOR 

FORTRAN  PROCEDURE 

FORTRAN  PROCEDURE  VARIAELE 

GET 

GET  ARRAY 
GET  DATA 
INTEGER 


NO 

EPOCECURE 

PROCEDURE  VARIABLE 
PUT 

POT  ARRAY 
FOT  DATA 
FUT  DATA  ARRAY 
REAL 

STRUCTURE 

WHILE 

YES 


identifier  - a letter  optionally  followed  by  more  letters 
and  digits. 

integer-constant  - one  or  more  digits 

real-constant  - one  or  more  digits  , a ".",  possibly  more 

digits,  a possibly  a one  or  more  digits,  and 

a ")".  Either  the  decival  point  and  succeeding  digits 
or  the  parentheses  and  exponent,  but  not  both,  may  be 
omitted.  Thus  1 .,  0.  23,  6.22  (-23),  1 (-6)  , and 
3.  1<t  159  («00)  are  all  legal  real-constants. 
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string-constant  - a sequence  of  characters  enclosed  in 

apostrophes.  Apostrophes  are  not  allowed  in  the  string 
proper. 

delimiter  - one  of  the  following: 

eor  ()  Id:  . , 6 | : 

(The  last  six  are  called  relaticral-operators.) 

Except  in  string-constants,  blanks  are  insignificant  and  nay  be 
used  freely  for  clarity.  Text  frcm  a up  to  the  next  Heor" 

(end-of-record)  is  treated  as  a consent. 


PROCEDU  HES . 

progras  * (procedure) 

procedure  * identifier  [ " ("  identifiers  ") " [result]  ] "eor" 
begin 

(declaration) 

(statement) 
end  . 

identifiers  » (identifier,)  identifier 
result  * identifier 
begin  = " ( ("  "eor"  . 

end  - *•)  ) " "eor" 

Parameters  are  passed  using  call- by-r ef erence.  "result"  nay  be 
used  just  like  any  other  formal  parameter,  in  particular  as  a 
destination,  but  must  be  a scalar. 

If  the  parameter  list  is  omitted,  the  procedure  is  assumed  to  be 
the  top  level  main  program. 


DECLARATIONS 

• 

declaration 

s 

scalar-type  [ " ("  bounds  ") " ] ":"  identifiers  "eor 

1 

"STFdCTURE"  identifiers  "eor" 

begin 

(declaration) 

end 

1 

["FORTRAN"]  "PROCEDURE"  ["VARIABLE"] 

[ " ()  " scalar-type  ] identifiers  "eor" 

scalar-type 

X 

"INTEGER" 

1 

"REAL" 

1 

"COflPLEX" 

bounds  * ( boun 

ds,]  [expression;]  [expression]  . 

No  global  variables  are  alloved;  communication  occurs  only 
through  parameter  lists.  Declarations  reserve  storage  on  a 
stack;  the  variables  are  undefined  until  first  assigned  to. 
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If  the  <expression>:  part  of  a bound  is  omitted,  1: 
The  second  <expression>  should  be  omitted  for  a for 
parameter,  and  an  appropriate  value  vill  be  taken  f 
of  the  corresponding  actual  array  argument.  The  or 
actual  array  argument  need  not  Batch  the  origin  of 
corresponding  formal  array  parameter.  For  example, 
argument  A was  declared  PEAL(0:7):  A and  the  formal 
was  declared  REAL  ( ) : B,  then  B(8)  will  correspond  t 


is  assuied. 
aal  array 
rom  the  length 
igin  of  an 
the 

if  the  actual 
parameter  B 
o A (7) . 


STRUCTURES  are  data  areas  assumed  to  be  decomposed  as  indicated 
in  the  subdeclarations.  Thus  if  actual  parameter  A Is  declared 
STRUCTURE:  A 

INTEGER:  I,  K 
PEAL:  X 
COMPLEX:  Z 


and  corresponding  formal  parameter  F 
STRUCT  UP  E : F 

INTEGER:  J,  L 
COMPLEX:  W 

then  A. I and  F.J  correspond,  but  A.Z 
assigned  to,  both  A.X  and  A.Z  may  be 


is  declared 


and  F.  H do  not . 
destroyed. ) 


(If  P.B  is 


A PROCEDURE  VARIABLE  is  a variable  that  may  refer  to  various 
actual  procedures;  in  contrast,  a PRCCEDORE  is  literally  the 
name  of  a procedure. 


STATEMENTS. 

statement  = procedure-call  "eor" 

| destination  expression  "eor" 

| "GET  ("  identifiers 
| "GET  ARRAY  ("  identifer  ") " 

| "GET  DATA  ("  identifiers  ") " 

I "PUT  ("  arguments  ") " 

| "PUT  ARRAY  ("  destination  " 

( "PUT  DATA  ("  arguments  ")" 

| "PUT  DATA  ARRAY  ("  destination  ")  " 

| "CASE"  "eor" 
begin 

(boolean-expression  "eor" 
begin 

(statement) 
end  ) 

["ELSE"  "eor" 
begin 

Is  tatement) 
end  ] 
end 

| "BHILE  ("  boolean-expression  " "eor" 
begin 

(statement) 

("FIRST"  "eor" 

(statement)  ) 
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end 

| "POP  (" 

((  expression  (•'<"  |"<»")  destination  ("<"|"<»")  expression) 
l(  expression  (">"|">»")  destination  (">"|">«")  expression)  ) "eor" 
begin 

(stateaent) 
end  . 

GET  reads  froa  the  next  record  of  the  input  data  a seguence  of 
constants,  separated  by  conaas.  For  GET  DATA,  each  value  should 
be  prefixed  with  "identifier  the  input  values  need  not 

appear  in  the  sane  order  as  the  corresponding  identifiers  in  the 
GET  DATA,  and  if  a value  is  oaitted,  the  variable  is  left 
unchanged.  PUT  DATA  and  PUT  write  out  the  current  values  of  the 
identifiers,  labelled  or  unlabelled,  in  an  intelligent  fashion. 

In  other  languages, 

CASE 

ccndl 
case  1 
cond2 
case2 
ELSE 

case  3 

night  be  written  as: 

I F ( condl  ) THEM 
case  1 

ELSE  IP  ( COnd2  ) THEM 
case2 
ELSE 

easel 

Similarly. 

WHILE  cond 
part  a 
FIRST 
part  b 

sight  be  translated  as: 

GOTO  FIRST 
TOP:  part  a 
FIRST:  part  b 

IP ( cond  ) THEN  GOTO  TOP 

If  the  FIRST  line  is  oaitted,  it  is  assused  to  be  at  the  end  of 
the  loop;  that  ia,  part  b is  eapty. 

FinaH|^8(  lo(j  <m  i <m  HiQH  j 
loop 

would  be  translated  as 

FOR  I - LOR  TO  HIGH 
loop 

and 

FOB ( HIGH  > I >•  LOR  ) 
loop 
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as 


■ 


FOB  I * HIGH-1  BY  -1  TO  LOB 
loop 

The  destination  and  eipressions  controlling  a FOR  loop  may  not  be 
modified  inside  the  loop. 


EXPRESS ICNS. 

destination  * (identifier  identifier 

| identifier  " ("  subscripts 
| d procedure-identifier 
subscripts  3 [subscripts,]  [expression] 

procedure-call  * procedure-identifier  [ " ("  arguments  ")  M ] . 

procedure-identifier  * identifier 

arguments  = [arguments,]  ( expression  | string-constant 

| YES  | NO  ) . 

expression  3 arithmetic-expression 
| boolean-expression 
arithmetic-expression  3 C"~”]  tern 

I arithmetic-expression  tern 

| arithmetic-expression  tern 

term  3 factor 

| tern  factor 

I term  "/"  factor 
factor  3 primary 

| primary  ••**"  primary 
| primary  primary  . 

primary  3 integer-constant 
| real-constant 
| destination 
| procedure-call 

| "("  arithmetic-expression  w) M . 

boolean-expression  3 [boolean-expression  "("]  boo  lean- factor  . 
boolean-factor  3 [boolean-factor  "6"]  boolean-secondary  . 
boolean-secondary  3 [,,-*M]  boolean-secondary 

| boolean-primary 
boolean-pria ary  3 YES  | NO 
| destination 
| procedure-call 

| arithmetic-expression  relational-operator  arithmetic-expression  . 

If  a subscript  is  empty,  it  is  assumed  that  an  entire  row  or 
column  is  being  referenced.  If  all  the  subscripts  are  empty,  the 
parentheses  and  commas  aay  also  be  omitted.  Array  expressions 
are  performed  elementwise. 

To  refer  to  a procedure  without  actually  irxokinq  it,  put  a a 
before  the  procedure  identifier. 

If  the  operands  are  mixed  INTEGER  and  RIAL  the  result  is  REAL:  if 
either  is  COMPLEX , the  result  is  COMPLEX.  Cixiding  an  INTEGER  by 
an  INTEGER  and  raising  an  INTEGER  to  a power  are  illegal. 
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FBI  MI TI VES  AND  MACROS.  The  following  nacres  are  predefined: 


LENGTH  (area y, i ) size  of  array  in  the  i-th  dimension 
DADD,  DfiU L , DDIV  extended  precision  arithmetic 
CEIL,  FLOOB,  SIGN,  APS,  FLOAT,  PI,  IM 
MAX,  MIN  ( X 1 , x2,  x3,  ...  ) 

HOD  ( xt,  x2  ) smallest  nonnegative  r such  that 
( xl  - r ) / x2  is  integral 
PI,  EFS,  HAXREAL,  KAXINTEGEB 
ACOS,  ASIN,  ATAN,  COS,  EXF,  LN  or  LOG, 

LOG  10 , SIN,  SORT,  TAN 

The  following  procedures  are  predefined: 

NEXT  LINE[  (n)  ] skip  to  start  of  next  line, 

then  put  out  n- 1 flank  lines 
(if  no  arguxent  then  n=  1) 

NEXT  FAGE  page  eject 

END  OP  INPUT  BOOLEAN  expression 

PLANK  SEPAR ATION  (n)  number  of  blanks  to  te  left 

between  output  values  (default  1) 

INTEGER  DIGITS  (n)  number  of  digits  for  integer 

output  (default  5) 

REAL  DIGITS  (n)  ...  for  real  output  (default  5) 

REAL  LEADING  DIGITS  (n)  0<=n<*3  (default  3) 

DATE  AND  TIME  is  replaced  by:  'dd  mmm  19yy  hh:mm' 

Macros  ur?  evaluated  much  like  procedure  calls.  First,  each 
argument  is  evaluated;  then  the  macro  is  applied  to  its  arguments 

by  inline  expansion  according  to  the  macro  definition;  finally, 
the  replacement  text  is  regarded  as  fresh  input.  During  the 
evaluation,  tokens  beginning  with  " have  the  first  stripped 

off.  (This  allows  macros  to  be  temporarily  "hidden.") 

DPFI N E ( identifier,  replacement  text  ) defines  a macro.  The 
replacement  text  is  a sequence  of  tokens,  pcssitly  containing  eor 
and  matched  parentheses.  Places  in  the  replacement  text  where 
arguments  are  to  be  inserted  during  expansion  are  indicated  by  $ 
followed  by  a digit  or  letter. 

IFELSE  (a  ,b,c  ,d)  is  replaced  by  c if  the  token  a is  the  same  as  b, 
otherwise  by  d.  Either  a cr  b may  be  empty.  For  example,  the 
text 

DEFINE  (VERBOSE, YFS) 

IFELSE  (VFRPOSF,  YES,  PUT  DATA  (X)  ,) 
would  te  replaced  by 
PUT  DATA  (X) 

A macrc  defined  before  the  first  procedure  applies  globally; 
other  macros,  which  may  temporarily  redefine  the  gldbal  ones, 

apply  only  to  the  procedure  in  which  thev  are  defined. 
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APPENDIX  I 


T i tpleme ntation  notes 


PILE  FORMAT.  The  Report  sentions  only  an  atstract  end-of-record 
dellsiter  and  not  any  particular  file  format.  This  is  because 
the  assumptions  of  local  text  editors  are  cf  overwhelming 
importance. 

The  current  implementation  provides  for  card  image  files-  in 
which  columns  73  through  80  are  ignored.  In  order  to  obtain  text 

records  longer  than  72  characters,  continuation  lines,  flagged  by 
a blank  in  column  1,  may  be  used.  Comments,  starting  with  a "I" 
end  in  column  72. 

More  technically,  each  card  has  at  end-of-line  character  EOL 
appended  to  it  and  each  record  has  an  EOF  appended  after  the  last 

EOL.  Thus 

REAL:  • constants 

PI,  EPS 

is  translated  to 

REAL:  ^constants  EOL  PI,  EPS  EOL  EOR 

The  T processor  discards  blanks  and  text  from  a • up  to  the  next 
EOL.  The  runtime  I/O  package  discards  blanks,  comments,  and 
begins  and  ends.  On  the  other  hand,  the  PRINT  program  makes  use 
of  EOLs  to  intelligently  format  its  listing. 

The  input/output  procedures  effectively  append  an  infinite  string 
of  EOF  characters  to  the  end  of  the  file.  The  characters  EOL, 
EOR,  and  EOF  are  represented  internally  as  the  ASCII  control 
characters  US,  RS,  and  FS. 


OTHER  REMARKS  ON  USING  T.  Strings  may  be  passed  through  a 
procedure,  for  example  froa  a high  level  routine  to  a graphics 
priaitive,  by  declaring  a formal  Faraater  as  INTEGER (1). 


Various  options  may  be  invoked  by  a macro  call  of  the  fora: 
option  ( ? ) 

where 


7 is  either 
and  option  is 

RECURSIVE 

SUBCRECK 

SHORT 


ON  or  OFF 
[ default 
[OFF] 

[OFF 
[OFF  ] 


FORTRAN  FACADE  [OFF] 


value  is  in  brackets] 
recursive  procedures 
subscript  checking 
short  precision 
procedure  looks  to  the 
outside  world  like  Fortran 


UNDERFLOW  [ON]  turn  off  underflow  error  messages 

in  the  current  procedure  and  its  descendants 
(reals  that  underflow  are  set  to  0] 
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Arrays  aay  have  at  aost  S subscripts 


For  the  purposes  of  STRUCTURE  alignment,  the  sizes  of  the  scalar 
types  are: 

INTEGEH  1 unit 

REAL  2 units 

COMPLEX  4 units 

REALS  and  CONPLEXes  should  start  an  even  nuzber  of  units  into  the 

STRUCTURE  for  fastest  access. 


JCL  AT  SLAC.  In  order  to  invoke  the  FRINT  prograa  (desiqned  to 
Intelligently  list  a set  of  T procedures),  use  the  PRINT 
catalogued  procedure  stored  in  W Y I . CG , EHG . E.  An  exaaple  of  a 
PRINT  run: 

//  JOE 

//PROCLIB  DD  DSN=WYL.CG. EHG.E,DISP=SHR 
//P  BXEC  PRINT 
//INPUT  DD  • 

...  T procedures  ... 

The  PRINT  prograa  recognizes  three  coaaands,  which  look  like 
coaients: 

#XE  causes  the  nert  lire  tc  appear  on  the 

next  page 

•XTtitle  causes  the  characters  iaaediately  appearing 
after  ’••XT"  to  appear  in  the  title  line, 
and  the  next  line  will  appear  on  the  next 
page 

IXLn  causes  only  identions  of  at  aost.  level  n to 

appear,  e.g.  fXLO  will  cause  indention  to 
be  suppressed 


In  order  to  invoke  the  precoapiler  and  compile  the  resu 
PL/I,  use  the  TC  catalogued  procedure  stored  in  WYL.CG. 

exaaple  of  a coapile-and-go  run: 


//  JOB 

//PROCLIB  DD  DSN= WYL.CG. EHG. E,DISP*SHR 
//C  EXEC  TC 
//INPUT  DD  * 

...  T procedures  ... 

//G  EXEC  PLIG 


An 


Provision  has  teen  made  for  a PL/1  dump.  To  get  this,  add  the 
JCL  card: 

//PLIDUMP  CD  SYSOUT*A,DCB=  (RECFE  = FPA,LRECL*133,BLKSIZE=1330) 


APFENDIX  G 


Report  on  the  graphics  interface  G 


"In  improving  exploratory  data  analysis,  we  need  to  find 
new  questions  to  ask  of  the  data  (probably  the  hardest 
task) , and  new  ways  to  ask  old  questions.  Throughout, 
arithmetic  as  a basis  for  preparing  pictures  is  likely 
to  be  the  keynote.  It  is  most  important  that  we  see  in 
the  data  those  things  we  do  not  expect  — pictures  help 
us  in  this  far  more  than  numbers,  though  we  can  gain  a 
let  just  by  what  numbers  we  use." 

John  Tukey 


ESTABLISHING  THE  ENVIRONHENT.  In  order  to  remember  various 
options  and  settings,  such  as  character  size  and  line  type,  and 
as  a buffer  for  lowlevel  plotting  commands,  a work  area  PLOT  is 
provided  to  the  graphics  procedures.  This  may  be  declared  as: 
STRUCTURE:  PLOT 

INTEGER  (500) : WORK 
STRUCTURE:  USER 

• • a 

The  workspace  USER,  which  nay  be  as  large  or  small  as  desired, 
allows  parameters  to  be  passed  to  user  procedures  called  by  G. 

To  initialize  PLOT  at  the  start  of  a run,  call 
GOPEN ( DEVICE,  PLOT  ) 

where 

DEVICE  is  a string  containing  one  of  the  codes: 

CAL7ICH  microfiche 

PDS4013  Tektronix,  Hewlett-Packard 

VEP12FF  Versatec 

Note  that  if  VEP12FF, EXT SORT  is  specified, 
then  the  reordering  of  the  plot  commands 
necessary  for  the  Versatec  will  not  be  done 
by  a main  memory  sort  (greatly  reducing  the 
run-time  memory  requirements) . In  this  case 
an  external  sort  step  must  be  supplied. 

Before  each  new  picture,  including  the  first,  call 
GPICT ( PLOT  ) 

Finally,  at  the  end  of  the  run,  call 
GCLOSE  ( PLOT  ) 

Omitting  this  may  cause  the  last  picture  to  be  lost. 

For  convenience,  the  initial  transformation  is  GTRAI1A,  which 
performs  the  mapping 

I :«  SCALE  * ( X - X0  ) 
where 
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I 

I A — ...  ii  ii  ■ ■■ ■!■■■-  - - 


f 


REAL (2) : B , X.  XO,  SCALE 

and  XO  * (0,0),  SCALE  * (1,1).  Tc  change  these  values,  which  are 
sawed  in  PLOT. DORR , call 

GTRAN 1 ( XO,  SCALE,  PLOT  ) 

Line  types,  character  sizes,  and  character  angles  are  specified 
by  calling 

GLTYPE  ( LTYPE,  PLOT  ) 

GCSI ZE  ( CSIZE,  PLOT  ) 

GCANGL  ( CANGL,  PLOT  ) 
where 

LTYPE  is  a string  containing  one  of  the  codes: 

SOLID,  DOT,  DASH,  or  DOT-DASH 
REAL:  CSIZE,  • character  spacing;  initially  1; 

CANGL  I character  angle,  in  radians 

• counterclockwise  frci  horizontal; 

• initially  0; 


EASIC  DRAWING.  To  Jump  straight  to  the  point  X, 

GJOHP ( X,  PLOT  ) 
where 

R EA L (} : X t destination  (in  user  coordinates) 

To  draw  a line  from  the  current  pcsition  tc  X, 

G DRAW ( X,  PLOT  ) 

If  the  coordinate  transformation  is  curvilinear,  this  produces  a 
piecewise  linear  approximation  to  the  iiplied  curve.  Following  a 
change  of  coordinate  system,  GJUHF  should  be  called  before  GDRAtf. 

Tc  write  out  text,  call 

GTEXT ( X,  PEI,  SEC,  PLOT  ) 
where 

REAL  () : X • location  for  center  of  first  character 

PRI,  SEC  are  strings  of  length  at  most  255 

In  order  to  obtain  a large  alphabet,  text  is  presented  to  GTEXT 
using  a Fair  of  strings.  Every  pair  of  corresponding  characters 

in  the  primary  and  secondary  strings  denotes  one  character  in  the 
extended  alphabet. 


character  for 
upper  Roman 


lover  Greek  upper  Greek 


The  secondary 
lover  Roman 

im 

L G H 

For  common  special  characters  the  secondary  character  is  also  a 
space.  The  primary  character  for  Greek  letters  is  the  first 
letter  of  its  English  name  or  one  of  the  special  cases: 

H eta  F phi  V omega 

Q theta  Y psi 

Additional  special  and  control  characters  are  available: 
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f 


C begin  superscript 
end 

2 begin  subscript 

3 end 

4 save  positionl 

5 restore 

6 save  position2 

7 restore 

8 save  position3 

9 restore 

2  increase  si2e 
P decrease 

IT  half  up 

2 down 

3 third  up 

4 down 

5 sixth  up 

6 down 

OU  one  back 

1 half  forward 

2 back 

3 third  forward 

4 back 

5 sixth  forward 

6 back 

HZ  is  an  elenent  of 
N is  not  an  element  or 
2 there  exists 
A for  all 
I intersect 
0 union 

< is  strictly  contained  in 
> strictly  contains 
L is  contained  in 
B contains 

UA  up  arrow 
D down 
L left 
B right 

B bidirectional 


JS  Integral, 

contour  integral 

p partial-d 
D del 

S plus-cr-alnus 
X tines 
: divided  by 

♦ abstract  plus 
* abstract  tines 
2 radical 
0 infinity 
/ back  slash 
( left  square  bracket 
) right  square  bracket 
L left  angle  bracket 
R right  angle  bracket 
A left  curly  bracket 
Z right  curly  bracket 
< less  cr  equal 
* not  equal 
2 equivalent  to 
0 proportional  to 
> greater  equal 
0 degree 
N section 
6 dagger 
P double  dagger 
H h bar 
B larbda  bar 
0 underscore 
T overscore 
OP  cross 

1 diagonal  cross 
2 diamond 
3 box 
4 star 

5 diagonal  start 
6 cross  with  serifs 
7 diagonal  cross  with  serifs 
8 compass  rose 
9 octagon 


Por  exacple,  the  definition  of  the  gamna  function  is  given  by: 
GTE  XT  (X,  ' (N-  1)  !=G  (N)  -I40052P05  fiO-T  1T0N- 1 1DT  • , 

• L H L SCCSCCC  C LC  LCLCL  CLL'rPLOT> 
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DIMENSIONING.  Given  Halts  on  tbc  data  range,  suitable  values 
for  plotting  the  data  say  be  obtained  by  calling 

GSCAL8  ( DATA  Mil,  DATA  MAX,  # input 

LABEL  MIN,  LABEL  MAX,  EXF,  MAJOR,  MINOR  ) t output 
The  data  will  then  be  bracketed  by  LABEL  MIN*10**EXP  and  LABEL 
MAX* 10**EXP,  which  are  round  nuabera  in  engineering  fornat. 

GTIC  acta  soaewhat  like  a ruler,  drawing  an  unlabelled  axis  with 
large  and  saall  tic  marks: 

GTIC  ( L,  H,  MAJOR,  MINOR,  OPF,  PLOT  ) 
where 

REAL  ()  : L,  H , * endpoints  cf  axis; 

OFF  • offset  coordinates  cf  aajor  tic  nark 

• endpoints,  which  determine  the  size 

• and  direction  of  the  tic  narks; 

BEAL:  MAJOR,  MINOR  * number  of  major  and  minor  tic  marks, 

• counting  endpoints;  thus  a yardstick 

• might  have  MAJOR-U,  MINOR-13; 

GLAB  adds  integer  labels  to  the  tic  narks  produced  by  GTIC: 

GLAE ( L,  H,  MAJOR,  OFF,  LOR,  HIGH,  PLOT  ) 
where 

INTEGEB:  LOR,  HIGH  • label  values  at  endpoints; 

BE AL  () : OFF  • offset  of  first  character  of  label 

L,  H,  MAJOR  are  as  in  GTIC. 

GFORH1  lays  cut  a form  suitable  for  scatter  plots  and  function 
graphs: 

GF0RM1  ( GPRI , GSEC, 

XPRI , XSEC , XLOH , X HIGH  , 

TPRI,  TSEC,  YLOW,  TBIGH,  PLOT  ) 

where 

GPRI,  GSEC,  XPRI,  XSEC,  TPRI,  TSEC  are  pairs  of  strings 
defining  the  general  title,  X-  and  T>axis  labels; 
REAL:  XLOH, XHIGH,TLOR, THIGH  • specifies  the  data  limits 
GPORH1  autos atically  sets  up  a cocrdinate  systes  so  that  the  plot 
will  fill  the  screen.  To  add  a function  curve,  just  use  GJUHP 
and  GDBAR. 


PLOTTING.  GSCAT  provides  scatter  plots: 
GSCAT ( GPRI,  GSEC, 

XPRI,  XSEC,  X, 

TPRI,  TSEC,  T,  PLOT  ) 

where 

RIAL  o : X,  T • data  points 

GPU,  ...  , TSEC  are  as  in  GFORH1. 


GGRAPH^grovi deSp^aphs^of  functions  of  one  variable: 

XPRI*  XSEC | A,  B, 

FPFI,  FSEC,  F,  PLOT  ) 

where 
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HEM:  A . B • endpoints  of  interval  for  graphing 

PROCEDURE  () BEIL:  F • function  to  fce  plotted,  which 

• is  called  in  the  fora: 

• I * F ( X,  PLOT  ) 


To  plot  contours  of  the  surface  passing  through  the  points 
( X(I)  , Y (J)  , F n,  J)  ) , 

call 

GCCNT  ( LL,  UR,  F,  LEVELS,  PIOT  ) 
where 

BEAL (2) : LI,  UF  • coorditates  of  lower  left  (aost 

• negative)  and  upper  right  (aost 

• positive)  corners  of  the  rectangle 

• in  the  X Y plane  on  which  data 

• is  given 

REAL(,):  F • data  values 

REAL  ()  : LEVELS  § contour  levels  to  be  plotted 
A unifora  grid  is  assuaed. 


To  draw  a transect  surface  plot  with  hidden  lines  reaoved,  call 
GSUF  F ( LL,  UR,  F, 

AZIMUTH,  ELEVATION,  ORIGIN,  SCALE,  PLCT  ) 

where 

BEAL:  AZIMUTH,  ELEVATION,  SCALE 
REAL  (2)  : ORIGIN 

The  coordinate  transforaation  used  in  GSURF  naps  P * (X,Y,F)  into 
ORIGIN  ♦ ( -SI  Cl  C ) (P-C)  *SCALE 

( -S2*C1  - S 2*S 1 C2  ) 

where  Cl  * CCS  (AZIMUTH)  . ...  , S7  = SIN  (ELEVATION) 

C « ( (LL  (1)*0F(i))/2,  (LL  (2)  ♦UF  (2)  ) /2,  0 f 
LL,  UR,  and  F are  as  in  GCONT. 
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ABSTRACT 

Facts  about  polynomial  interpolation  conjectured  some  time  ago  by 
Bernstein  and  Erdos  and  proved  recently  by  Kilgore,  Pinkus  and  the 
speaker  are  discussed.  These  facts  concern  the  choice  of  interpolation 
points  which  make  the  norm  of  the  interpolation  map  as  small  as  possible. 
It  Is  then  pointed  out  that  such  optimal  nodes  offer  little  improvement 
over  readily  available  "good"  nodes  whose  use  is  encouraged.  Finally, 
the  question  of  "good"  interpolation  points  for  polynomial  approximation 
In  several  variables  is  raised. 
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COMPARING  DIGITAL  FILTERS  WHICH 
PRODUCE  DERIVATIVE  APPROXIMATIONS* 


Charles  K.  Chui,  Philip  W.  Smith,  and  Joseph  D.  Ward 
Department  of  Mathematics 
Texas  A&M  University 
College  Station,  Texas  77843 


ABSTRACT.  A method  for  comparing  digital  filters  which  produce 
derivative  approximations  is  proposed.  A Judicious  use  of  the  Peano 
Kernel  Theorem  allows  one  to  compare  these  digital  filters  rather 
cheaply. 

1.  INTRODUCTION.  Digital  filtering  is  often  used  in  a variety  of 
situations  for  smoothing  data  and  approximating  derivatives.  In  this 
article,  we  propose  a method  by  which  two  competing  filters  may  be 
Judged.  For  simplicity,  we  will  consider  only  digital  filters  of  the 

type:  f^  ■=  Dj  (6,0  where  6 » (Bq er+p)  ’ (f0 fN) . 

and 


Dj (6, f ) 


P r 

I M,  . + [ f;  . , 

k-0  k j‘k  k-1  k+p  J'k 


J ” 0,  . . . , N.  Throughout,  f n>  O are  considered  to  be  zero  if 
n > 0,  and  f^  will  be  an  approximation  of  f'(j/N).  The  reader  can 
see  that  the  more  general  situation  including  the  use  of  higher  order 
derivatives  can  be  treated  in  an  analogous  manner. 

A related  problem  is  to  choose  the  best  filter  coefficients  6^  in 
the  sense  of  Sard  [6].  This  would  also  produce  the  optimal  estimate 
analogous  to  the  ideas  of  Golomb-Weinberger  [5].  Basically,  this 
entails  choosing  a class  of  functions  B so  that  the  best  filter 
coefficients  8*  satisfy  the  mini-max  property 


* 

inf  sup  | D (6,0 
6 lf€B 


- f * CD  | 


sup | D i 
f€B 


(6*,f)  - f ' (1) | 


*This  research  was  supported  by  the  U.S.  Army  Research  Office  under 
Grant  No.  DAHC  04-75-G-0186. 
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In  section  2 we  develop  some  theoretical  results  and  in  section  3 
we  discuss  numerical  implementation  of  a specific  comparison  method. 

In  particular,  it  is  shown  that  relatively  cheap  comparisons  may  be 
made . 

2.  DISCUSSION  OF  RESULTS.  The  type  of  filters  to  he  considered  may 
be  described  as  follows.  Let 


N 

and  let  {f  } q be  data  samples  taken  at  i/N,  i=0,  ....  N.  We 
assume  that  there  is  a function  f ( and 

f4  = f (i/N)  , i=0 N. 

The  type  of  recursive  filter  considered  here  is  given  by  fj  = D (6,f), 
for  j=0,  ...,  N,  where  _f  = (fg,  . ..,  f^) , 0 = (0Q,  ....  6p+r> , and 

D.(0,f)  is  as  in  (1).  fj  is  then  an  approximation  of  f'  (.i/N) . The 
problem  is  to  choose  the  S^'s  in  the  above  formula  so  that  the 
derivative  approximants  are  "best  possible". 

There  is  a classical  method  to  attack  this  problem  dating  back  to 
Oolomb-Weinberger  [5]  and  Sard  [6].  Assume  that  (1)  is  exact  for 
polynomials  of  degree  less  than  or  equal  to  n-1  and  let  B^  be  the 
unit  ball  of  H^. 

I’roposit.ipn  1.2.  The  following  inequality  holds  for  j=0,  ...,  N-1. 

sup  |D  (0,f)  - f'(j/N)|  < sup  |D...(B,f)  - f ' (j  +1/N)  | . 

f€B  J f€B  J i 

n n 

The  proof  is  derived  from  considerations  of  the  Peano  Kernel  Theorem 
and  we  omit  the  details.  Note  that  the  "worst"  error  must  occur  at 
,)=N. 
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Bg.(JU\l.t  ton.  1.3.  A (p+r)-tuple  JS*  Is  said  to  be  optimal  If 

sup  |D  (B*,f)  - f ' (1) | < sup  |D  (6,f)  - f ’ (1) | 

ftB  f (B 

n n 

for  all  (p+r)-tuples  j?. 

E&flJPP-JLc.  1.4.  Consider  the  recursive  filter  given  by 

fj  “ (1-a) [f  -f  IN  + af'_1  - D (a.f). 

This  filter  is  exact  for  polynomials  of  degree  less  than  or  equal  to 
one.  The  optimal  a*  i a*(N)  can  be  computed  by  minimizing  the 
formula 

sup  |D  (a.f)  - f * (1)  | - ^(l-a2N)(l-Ki+a2)/[3N(l-a2)] 
f«B,  ’ 

for  a t ±1.  In  particular,  a*(N)  converges  to  -2  + /3  as  N -*■  °°. 
The  following  may  be  observed  about  the  above  example: 

i)  The  optimization  problem  is  nonlinear. 

•k 

ii)  For  each  N,  there  is  a unique  solution  a . 

* " 

iii)  The  a converges  as  N goes  to  infinity. 

N 1/2 

iv)  The  rate  of  decrease  is  of  order  0(1/N  ). 

For  the  general  problem,  we  can  prove 

Xhgoreni  1.5.  a ) ( 1 ) has  an  optimal  solution  6* . 

b)  J[f  (1)  is  exact  for  all  polynomials  of  degree  less 
than  or  equal  to  n-1  and  if  p is  larger  than  n-2,  then  there  exist 
positive  constants  and  so  that 

C < sup  N_n+3/2  |D  (6*(N),f)  - f'(l)|  < C 
f€B 

n 

^etch  of_  proof : Part  a)  is  proved  in  a standard  fashion  by  assuming 
that  a minimizing  sequence  is  unbounded  and  then  deriving  a 
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contradiction.  For  the  proof  of  part  b) , consider  the  non-recursive 
f i Her 


sup  |l)N(P*(N),f)  - f ' ( 1 ) | 
fill 

n 

is  bounded  above  by  the  reciprocal  of  the  norm  of  a certain  fundamental 
spline.  The  norm  estimates  of  do  Boor  (|1,  p.  38]  and  [2,  p.  115])  for 
such  splines  now  yield  the  upper  bound.  To  obtain  the  lower  estimate, 
note  that 

N 

nN(tf.O  = y Y f(j/N). 
i-o  J 

'Vhe  best  estimator  of  this  type  is  S^(l).  This  follows  from  [5|  where 
S is  t he  natural  spline  of  order  2n  interpolating  the  data.  The 
results  of  Co  lorn b [4]  and  de  Boor  (1]  and  [2]  now  allow  us  to  obtain  a 
positive  constant  C^  > 0 such  that 

C.N11  < max  |s'(l)  - f'(l)|. 

ffB 

n 

This  completes  the  sketch  of  our  proof. 

3.  COMPUTATIONAL  ASPECTS  OF  COMPARING  FILTKRS.  As  in  the  previous 
section  we  consider  filters  of  the  type  fj  » Dj(6,f)  where  Dj(jJ,f) 
is  given  in  (1).  We  also  assume  that  ■ Dj(0,O,  J“0,  ...»  N,  is 

exact  for  polynomials  of  degree  <n-l.  We  may  obtain  an  integral 
representation  of  the  error  over  via  the  Peano  Kernel  Theorem  [3] 

of  the  form 

(2)  sup  |D  (6.0  - f • (1)  | - 

f€P, 

n 


VfVj]coi2dt 


i 
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where  Kg(t)  - DN(8,  ('-t)""1)  - (n-1) (l-t)n“2 

Thus,  if  one  wants  to  know  whether  8*  or 


8 


differentiation  formula 
to  compare  the  numbers 


fi 


Dj(8  ,f)  (1-1.  2) 


/(n-1)!. 

2 

produces  a better 
it  is  only  necessary 


(3) 


[KN  (t)]2dt,  i-1,  2. 

0 6 


Computing  the  integral  in  (3)  is  made  easier  by  noticing  that  for 

t € (J/N,  (J+l)/N),  K^(t)  is  a polynomial  of  degree  lesa  than  n. 

Ji 

Hence,  one  could  use  Gauss  quadrature  formula  with  n points  in  each 

interval  of  the  form  (J/N,  (j+l)/N).  This  means  that  one  must  evaluate 
N 

K.  at  nN  points  on  [0,1],  and  off-hand  one  would  suspect  that  the 
JJ 

filter  fj  - 0j(j5,O  would  have  to  be  applied  to  nN  functions  of  the 

type  (x-ij)"  1 where  the  Tj  would  be  the  properly  scaled  Gaussian 
points. 

However,  since  the  digital  filter  is  recursive  we  will  indicate 
below  that  we  need  only  apply  it  to  n functions!  This  is  of  course 
a considerable  saving  since  one  would  like  to  choose  N large  in  order 
to  guage  the  long  term  effects  of  the  filter.  Note  that  f(x)  H (x-t)"  * 
vanishes  for  all  x < t so  that 


(4) 


VS.*)  " Vj^ 


.n-1 


where  g(x)  - (x-(J/N  + t))  . In  particular,  (4)  indicates  that  all 

N 

the  information  needed  to  compute  Ku  at  the  scaled  Gaussian  points 

JJ 

can  be  obtained  by  applying  the  filter  to  the  n functions 

in 


, .n-1 


1-1 


where  the  x ^ are  the  scaled  Gaussian  points  in  the  interval  [0,  1/N]. 

We  conclude  this  section  with  the  following  observation.  Our 
original  comparison  criteria  led  us  to  consider 
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sup  |DN(B,f)  - f ' CD  I 
f*B 

a 

as  a measure  of  a filter's  performance.  But  as  we  Just  indicated  above 

we  actually  only  need  to  apply  the  filter  to  n test  functions  in 

order  to  recover  (5)  as  opposed  to  all  the  functions  in  B . This 

n 

appears  to  be  a remarkable  savings. 
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ABSTRACT . The  classical  Nystrom  method  for  the  numerical  solution  of 

Fredholm  integral  equations  of  second  kind  consists  of  numerical  integration, 

collocation,  and  interpolation.  The  approximate  solution  obtained  by  this 

procedure  is  shown  to  be  identical  to  the  solution  of  certain  finite-rank 

integr.nl  equations  with  kernels  belonging  to  a specified  class  {K  },  and 

n 

thus  has  minimal  error  with  respect  to  approximation  of  the  original  equation 

over  this  class.  A computable  (but  in  general  nonoptimal)  error  bound  for  the 

Nystrom  approximate  solution  can  be  obtained  on  the  basis  of  how  well  a specific 

finite-rank  integral  operator  with  kernel  in  {K  } approximates  the  integral 

n 

operator  in  the  Fredholm  equation  being  solved  numerically. 

1.  THE  NYSTROM  METHOD.  The  linear  Fredholm  integral  equation  of  second 

kind , 

1 

(1.1)  x(s)  - X / K(s , t) x (t) dt  * y(s),  0 <_  s <_  1 

0 

arises  in  the  solution  of  boundary  value  problems  for  ordinary  and  partial 
differential  equations,  and  other  important  applications.  Given  the  function 
y(s)  and  the  kernel  K(s,t),  one  problem  in  connection  with  (1.1)  is  to  obtain 
the  solution  function  x(s),  at  least  for  values  of  the  parameter  X for 
which  it  is  unique.  Another  problem,  usually  posed  for  the  homogeneous  equa- 
tion (y(s)  =0),  is  to  find  the  eigenvalues  and  eigenfunctions  of  the  kernel 
K (s, t ) , that  is,  values  X of  the  parameter  X for  which  the  homogeneous 

Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-75-C-0024 . 
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equation  has  corresponding  nontrivial  solutions  x*(s)  * 0.  Fredholm  [6] 

obtained  solutions  to  these  problems  in  terms  of  infinite  series  in  X,  with 

coefficients  expressed  as  repeated  integrals  of  larger  and  larger  determinants. 

As  in  the  case  of  the  analogous  Cramer's  rule  for  linear  algebraic  systems, 

Fredholm's  formulas  a,e  satisfactory  from  a theoretical  standpoint,  but  are 

usually  unsuitable  for  practical  computation.  To  remedy  this  situation,  various 

numerical  methods  to  obtain  approximate  solutions  have  been  developed  [4), 

including  the  simple  and  effective  procedure  due  to  Nystrom  [13).  Nystrom's 

method  consists  of  three  steps:  (i)  numerical integration,  which  replaces 

the  integral  in  (1.1)  by  a finite  sum  to  obtain  an  approximating  functional 

equation;  (ii)  collocation , which  gives  a finite  linear  algebraic  system  for 

values  z ,z  ,...,z  of  an  approximate  solution  of  (1.1)  at  n points 
12  n 

ti'tj# • • • »tn»  and  (iii)  interpolation,  which  uses  the  values  found  by  colloca- 
tion to  construct  a function  z(s)  on  the  entire  interval  0 < s < 1 which 
is  an  approximate  solution  of  (1.1)  such  that  z(t^)  «=  z^,  i = l,2,...,n. 

These  steps  will  be  explained  in  greater  detail  in  the  following  section  . 

The  simplicity  of  the  Nystrom  method  stems  from  the  fact  that  it  is 
basically  an  interpolatory  procedure;  only  values  of  the  kernel  K(s,t)  and 
the  function  y(s)  are  needed  at  the  nodes  of  the  numerical  integration 
formula  being  used.  In  terms  of  accuracy,  on  the  other  hand,  it  will  be  shown 
that  the  Nystrom  method  is  optimal  with  respect  to  a certain  class  of  approxima- 
tion procedures.  Estimates  will  be  derived  for  the  norm  ||x  - z||  of  the  error 
in  a normed  linear  function  space.  In  this  connection,  it  will  be  convenient 
to  use  operator  notation,  in  terms  of  which  (1.1)  may  be  written  in  the  form 

(1.2)  (I  - XK)x  - y . 

where  I denotes  the  identity  operator,  and  K the  linear  integral  operator 

with  kernel  K(s,t). 

2.  NUMERICAL  INTEGRATION.  A rule  for  numerical  integration  with  nodes 

t. , t_,...,t  and  weiqhts  w. ,w_,...,w  may  be  expressed  in  terms  of  the 
12  n 12  n 

linear  .functional 
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(2.2) 


V"  ■ £ 1(V“j 


Thus,  one  has  the  quadrature  formula 


(2.3) 


/ f (t)dt  - R (f)  + E If] 
0 n n 


n 


for  the  integral  of  a function  f(t)  over  the  interval  0 <_  t <_  1 , where 

denotes  the  (linear)  error  functional  associated  with  the  rule  R . Attention 

n 

will  be  restricted. here  to  quadrature  formulas  of  order  m 


and  interpolation 

type,  for  which  E [f)  **  0 if  f(t)  is  a polynomial  in  t of  degree  m - 1 
— n 

or  less  (10,  p.  162).  In  addition,  it  will  be  assumed  that  the  rules  for 
numerical  integration  considered  have  positive  weights  w.  > 0,  i = l,2,...,n. 
This  will  guarantee  that  the  quadrature  formula  of  interpolation  type  is 
convergent  in  the  sense  that  lim  E^[f)  “ 0,  at  least  for  continuous  integrands 
f(t)  (10,  p.  186).  n_>~ 

Applying  the  quadrature  formula  (2.3)  to  the  integral  equation  (1.1)  giver, 
the  equivalent  equation 


(2.4) 


x (s)  - X y K (s , t . ) w .x  (t . ) **  y(s)  + XE  [Kxl  (s)  , 
j„l  3 3 3 


where  the  error  term  has  been  moved  to  the  right-hand  side.  The  form  of  (2.4) 
suggests  that  an  approximate  solution  z(s)  of  the  integral  equation  can  be 
obtained  by  solving  the  functional  equation 


(2.5) 


*(s)  - X [ K(s,t.)w  z(t.) 

j-1  333 


y (s) 


which  will  be  called  the  Nystrom  equation  resulting  from  the  application  of  the 
rule  of  numerical  integration  R^  to  the  Fredholm  integral  equation  (1.1). 

3 . COLLOCATI ON . The  left-hand  side  of  equation  (2.4)  involves  the  values 


x(t^)  of  the  solution  x(s)  of  the  integral  equation  (1.1)  at  the  nodes 


tl't2',*‘'tn  t*'e  ru*e  numerical  integration  Rn*  Setting  s 
equation  (2.4)  gives  the  system  of  equations 

n 


t. 

l 


in 


(3.1) 


X (t . ) - X y K ( t , , t . ) W ,X  ( t , ) *=  y (t . ) + XE  (Kx)  (t . ) 
1 i 3 3 3 i n l 
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i ■ l,2,...,n,  for  these  values.  Approximations  to  x(tj  may  be 
obtained  by  discarding  the  error  term  in  (3.1)  and  solving  the  resulting 
collocation  equations. 


z.  - X J K(t.,t.)w.z. 
1 j-1  1 3 3 3 


i - 1,2, 


where  y.  = y(t.).  Given  the  rule  of  numerical  integration  R , setting  up 
11  n 

and  solving  the  finite  linear  algebraic  system  (3.2)  (or  the  corresponding 

eigenvalue-eigenvector  problem)  can  be  carried  out  readily  with  the  aid  of  an 

electronic  computer,  even  for  moderately  large  values  of  n. 

Some  other  formulations  of  the  system  (3.2)  may  be  useful  in  particular 

instances.  For  example,  the  quantities  = w^z^  may  be  needed  for  numerical 

or  theoretical  purposes.  Multiplying  the  equations  (3.2)  by  w, ,w  , . . . ,w  in 

12  n 

turn  and  setting  tk  “ w^y^  gives  the  system 


II 

S i " X l w K(t  ,t  ) 
j-1  1 3 


1,2, ... ,n 


which  can  be  solved  directly  for  . 

12  n 


Another  case  would  be  that 


the  kernel  K(s,t)  of  the  integral  equation  (1.1)  is  symmetric,  K(s,t)  = K(t,s) 

(or  Hermitian,  K(s,t)  = K(t,s)),  and  it  is  desired  to  carry  this  property 

over  to  the  coefficient  matrix  of  the  finite  system,  which  might  be  particularly 

convenient  when  calculating  approximate  eigenvalues  and  eigenfunctions.  One 

way  to  do  this  is  to  use  numerical  integration  rules  of  Chebyshev  type  with 

equal  weights  w.  ■ — , i = l,2,...,n  [10,  pp.  213-216).  A symmetrization 

1 n 

procedure  which  works  for  arbitrary  rules  with  positive  weights  is  to  multiply 
the  ith  equation  of  (3.2)  by  i4f7.  In  terms  of  = f^vTz^,  ®i  = i ' 
the  resulting  system  is 


XI 

- X y /vTK(t.  ,t .)  v'vTli; . 
1 jti  1 1 3 33 


" V 


1,2,. . . ,n  , 


which  has  a symmetric  (or  Hermitian)  coefficient  matrix  if  K(s,t)  has  the 
corresponding  property. 

4 . I NTERPOLATI ON . In  order  to  keep  the  system  of  collocation  equations 
small,  Nystrom  (13)  recommended  the  use  of  highly  accurate  rules  of  numerical 
integration,  such  as  Gaussian  quadrature.  Even  when  using  electronic  computers, 
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r ^ 


there  are  advantages  in  speed  and  accuracy  to  be  gained  in  this  way.  However, 
the  Gaussian  nodes  (see  (9,  p.  28H]  for  a table  bast'd  on  the  interval 
0 < t < 1)  are  not  always  the  points  at  which  approximate  values  of  the 
solution  of  the  integral  equation  are  desired.  In  other  applications,  what  is 
needed  is  not  just  a finite  set  of  values  z, ,z „,...,z  , but  rather  a function 
z(s)  which  is  an  approximate  solution  on  the  entire  interval  0 <_  s < 1 . 

These  requirements  can  be  met  by  some  method  for  interpolation  (or  extrapolation) 
from  the  values  computed  at  the  collocation  points  to  other  points  of  the 
interval.  Frequently  used  procedures  for  interpolation  are  based  on  piecewise 
linear  functions,  polynomials,  or  spline  functions,  which  lead  to  various 
representations  for  an  approximate  solution  of  (1.1).  Nystrom's  interpolation 
formula  is  simply 

n 

(4.1)  z(s)  ° y(s)  + X £ K (s , t . ) w . z . 

j=l  333 

in  terms  of  the  solutions  z.z„,...,z  of  the  system  of  equations  (3.2). 

12  n 

It  follows  from  the  Nystrom  equation  (2.5)  that  z(tj  = z^ , i «=  l,2,...,n; 
in  fact,  (4.1)  is  the  unique  solution  of  (2.5)  which  interpolates  the  values 
computed  from  (3.2)  (4,  pp.  88-89].  Formula  (4.1)  is  natural  from  the  stand- 
point of  simplicity  and  the  fact  that  it  makes  use  of  information  from  the 

original  integral  equation  (values  of  y(s)  and  K(s,t))  at  all  points  of 
the  interval,  while  an  arbitrary  interpolation  formula  would  only  use  the 

values  z * . This  suggests  that  the  Nystrom  method  should  be  fairly 

accurate,  as  is  usually  observed  in  actual  practice. 

Other  forms  of  (4.1)  may  be  useful  if  equations  (3.3)  or  (3.4)  are  used 

instead  of  (3.2).  In  terms  of  the  solutions  of  (3.3),  the 

12  n 

Nystrom  interpolation  formula  becomes  simply 

n 

(4.2)  z(s)  = y (s)  + X ]»  K (s , t . ) £ , . 

j=l  3 3 

If  equations  (3*4)  are  solved  for  instead,  then  (4.1)  may  be 

12  n 

mitten 

n 

(4.3)  z(s)  - y (s)  + X y K(s , t . ) */wTc  . . 

j-1  j 3 j 
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5.  SOME  METHODS  OF  ERROR  ESTIMATION . In  addition  to  computational 
results  as  furnished  by  application  of  Nystrom's  method,  some  indication  of 
their  reliability  is  desired  in  many  instances.  Such  error  estimates  may  range 
from  heuristic  to  rigorous,  and  be  pointwise , usually  for  |x(tj  - | , 

j - l,2,...,n,  or  global , in  which  case  a bound  is  given  for  the  norm  ||x  - z|| 
of  the  function  x(s)  - z(s)  in  some  appropriate  space.  Among  the  possible 
techniques  for  error  estimation  are  (i)  recovery  of  known  solutions;  (ii)  analysis 
of  the  error  functional  E^;  (iii)  use  of  the  theory  of  collectively  compact 
operator  approximation  [1);  and  (iv)  approximation  of  the  integral  equation 
by  an  equation  of  finite  rank.  The  method  presented  in  this  paper  falls  into 
the  last  category;  before  going  into  details,  a brief  description  of  the  other 
procedures  will  be  given. 

5.1 . Recovery  of  known  solutions.  For  a given  kernel  K(s,t),  it  may 

be  that  equation  (1.1)  has  known  solutions  x(s)  corresponding  to  particular 
choices  of  the  function  y(s).  In  this  situation,  the  approximate  solution 
z(s)  obtained  by  the  Nystrom  method  can  be  compared  directly  with  the  exact 
solution.  If  the  observed  accuracy  is  good,  then  the  numerical  results 
computed  for  right-hand  sides  y(s)  corresponding  to  unknown  solutions  x(s) 
may  be  viewed  with  some  confidence.  As  pointed  out  by  Nystrom  [131,  integral 
equations  with  known  solutions  may  be  constructed  using  functions  x(s)  for 
which  the  transformed  function  Kx(s)  can  be  calculated  explicitly. 

The  computational  effort  required  for  this  type  of  error  estimation  is 
not  particularly  great;  with  a given  coefficient  matrix,  the  system  (3.2)  (or 
one  of  the  alternative  forms  (3.3)  or  (3.4))  may  be  solved  simultaneously  for 
several  right-hand  sides,  corresponding  to  known  and  unknown  solutions  of  the 
integral  equation.  The  accuracy  of  the  approximation  obtained  for  the  known 
solutions  may  then  be  used  as  an  indication  of  reliability  of  the  results 
calculated  for  the  unknown  solutions.  It  should  be  emphasized  that  this 
procedure  is  entirely  heuristic,  it  being  possible  that  a certain  choice  of 
R^  would  work  well  for  some  manufactured  equations,  but  not  give  accurate 
results  for  an  actual  problem.  It  is  comforting  to  note  that  the  illustrations 
given  by  Nystrom  [13]  show  qood  performance  of  his  method  applied  to  boundary- 
value  problems  arising  in  mathematical  physics,  rather  than  just  to  some 
contrived  examples. 
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5.2.  Analysis  of  the  error  functional  E . Essentially,  the  error  in 
n 

the  approximate  solution  z(s)  obtained  by  the  Nystrom  method  is  due  to 

neglect  of  terms  involving  the  error  functional  E ; that  is,  the  replacement 

n 

of  (2.4)  by  (2.5)  and  using  (3.2)  instead  of  (3.1).  (More  precisely,  this  is 
the  truncation  error  of  the  method;  in  this  discussion,  roundoff  error  in  the 
actual  computation  and  errors  in  the  data  K(s,t)  and  y(s)  will  be  ignored.) 
An  expression  for  the  truncation  error  x(s)  - z(s)  will  now  be  obtained. 

From  (2.4)  and  (2.5), 


x(s)  - z ( s)  - X<E  [Kx] (s)  + £ K (s, t , ) w . [x (t . ) - z ( t , ) ] 

I j«l  333  3 


For  simplicity  of  notation,  set 


E (s)  - E [Kx]  (s) , 0 < s < 1 . 

n n — — 


The  errors  x(t^)  - z(t^)  at  the  collocation  points  satisfy  the  linear  system 
of  equations 

n 

(5.3)  Ix(t.)  - z(t.)]  - X l K(t. »t.)w, [x ( t . ) - z(t,)]  - XE  (t.)  , 

1 1 J-3  3 3 3 ni 

i - l,2,...,n,  by  (3.1)  and  (3.2).  Suppose  that  the  coefficient  matrix  A 


of  (5.3), 


A “ (6ij  - XK(t.  ,t  Jw  J , 


where  is  the  Kronecker  delta  (6_  - 0 if  i * j , 6^  - 1),  has  the 


inverse 


(5.5)  B - (^  ) « A . 

In  terms  of  the  coefficients  of  B,  the  solutions  of  (5.3)  may  be  written 

n 

(5.6)  x(t.)  - z(t. ) - X l 3..E  (t.),  i - 1,2, ... ,n  , 

1 1 j-1  13  n 3 


and  (5.1)  becomes 


x(s)  - z(s) 


fn(S>  + X lml  lx  *«-*yVjkBnCV}  * 


* 


On  the  basis  of  some  assumption  about  the  maqnitude  of  E (s)  = E [Kx] (s) , 

n n 

for  example,  |e  (s)|  M,  0 s 1 , ( 5 . t> ) may  bo  used  to  derive  pointwise 

error  bounds  at  the  collocation  points  t,  ,t _,..., t , and  (5.7)  will  furnish 

1 2 n 

a global  error  estimate,  or  pointwise  bounds  at  points  other  than  the  nodes 

of  the  rule  of  numerical  integration. 

If  the  integrand  f(t)  is  smooth  enough,  then  the  error  term  E [f]  of 

n 

a formula  of  order  m and  interpolation  type  can  be  expressed  as 


(5.8) 


E [f)  = C (n) 
n m 


f“"’<T  ), 
n 


0 < x <1 
n 


where  C ( n)  is  a known  function  of  n for  winch  lim  C (n)  = 0 in  the  case 
m m 

n-xo 

of  convergent  rules,  and  the  point  x is  unknown  (9,  pp.  108-116;  5, 

pp.  217-223].  Thus,  the  problem  of  bounding  E[Kx](s)  can  be  reduced  to 
3m 

estimating  K(s,t)x(t).  This  has  the  obvious  drawback  of  requiring  guesses 

9 1^ 

for  the  size  of  derivatives  of  the  unknown  function  x(t).  Another  hindrance 
to  this  method  for  obtaining  error  bounds  is  the  possibility  that  the  integrand 
K(s,t)x(t)  of  the  integral  transform  has  a low  degree  of  continuity  in  t, 
as  occurs,  for  example,  in  the  important  applications  in  which  K(s,t)  is  a 
Green's  function.  Thus,  even  when  x(t)  is  known  to  be  fairly  smooth,  one 
may  have  to  settle  for  a small  value  of  m in  (5.8),  and  a correspondingly 
large  majorant  function  C^fn)  15,  PP-  257-260]. 

To  avoid  some  of  these  difficulties,  use  may  be  made  of  a device  which 
gives  good  results  in  many  cases,  although  it  is  not  completely  rigorous.  If 
the  same  quadrature  formula  is  applied  with  two  different  values  of  n,  say 
p and  q,  then 


(5.9) 


E [f ] /E  [ f ] 

p q 


f(m)(x  ) 

E_ 


[C  (p)/C  (q)]  -y— 
m m , (m) 


<v 


Assuming  that  f^(x  ) and  f^(x  ) are  approximately  equal,  the  ratio  of 

P q 

the  errors  E If]  to  E [f]  can  be  estimated  by  the  computable  value  of 
P 

C (p)/C  (q) . For  example,  commonly 
m m 


(5.10) 

where  c is  a known  constant,  and 


C (n) 
m 
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(5.11) 


Cm(q)/Cm(p)  = (P/q)m  • 

A frequent  choice  is  q - 2p,  which  leads  to  the  rule  of  thumb  that  doubling 
the  nodes  of  the  rule  of  numerical  integration  reduces  the  truncation  error 
by  a factor  of  2 m or,  in  other  words,  gives  m additional  binary  digits 
of  accuracy.  To  apply  this  or  other  estimates  obtained  from  (5.11),  one  may 
simply  compare  the  number  of  digits  which  agree  in  the  two  answers  computed, 
or  apply  a suitable  extrapolation  formula  [11,  pp.  231-237).  In  order  to  use 
this  method  of  error  estimation,  the  system  of  equations  (3.2)  (or  one  of  its 
equivalent  forms)  must  be  set  up  and  solved  for  at  least  two  values  of  n. 

This  will  involve  considerably  more  labor  than  the  corresponding  operation  for 
integration  of  a function  of  a single  variable.  In  addition,  more  than  two 
numerical  solutions  of  the  integral  equation  for  different  values  of  n may 
be  required  to  pirovide  assurance  that  a theoretical  convergence  rate  as 
predicted,  for  example,  by  (5.10) , is  actually  being  observed  empirically . 

5.3.  Collectively  compact  operator  approximation  theory.  Error  estimates 
for  the  Nystrom  method  can  be  obtained  on  the  basis  of  the  theory  of  collectively 
compact  operator  approximation  developed  by  Anselone  [1] . As  this  application 
of  the  general  theory  is  also  explained  clearly  and  in  detail  in  the  book  by 
Atkinson  [ 4 , pp.  88-104],  only  the  essential  features  of  this  approach  will 
be  summarized  here.  The  setting  for  this  technique,  which  provides  rigorous 
results,  is  some  Banach  space  of  functions  x(s)  defined  on  the  interval 
0 < s < 1,  Usual  examples  are  the  space  C[0,1]  of  continuous  functions  with 
the  maximum  norm,  or  L^IO.l],  the  Hilbert  space  of  functions  with  Lebesgue 
integrable  squares  and  the  norm  defined  in  the  ordinary  way  as  the  square  root 
of  the  integral  of  the  square  of  the  function.  The  norm  of  an  element  x of 
the  space  will  be  denoted  in  the  customary  fashion  by  ||x||  . In  order  to  avoid 
possible  confusion,  a different  notation  will  be  used  for  the  operator  norm 
of  a linear  operator  L that  maps  the  space  into  itself,  which  is  defined  by 

(5.12)  M (L)  = sup  ||  Lx  ||  . 

IMI=1 

(Often  || L ||  is  written  for  M(L),  as  in  the  literature  cited  [1,4],  it  being 
clear  from  the  context  whether  the  element  oi  the  operator  norm  is  meant.) 

For  example,  if  K is  a linear  integral  operator  in  the  space  C[0,1]  with 
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continuous  kernel  K(s,t),  then 


1 

(5-13)  M(K)  = wax  / | K (s, t) (dt  . 

0<S<1  0 

In  L>2  f 0 , 1 ) , if  K(s,t)  is  symmetric  and  positive  definite  with  eigenvalues 
0 < X1  £ — •••*  then 

(5.14)  M(K)  = — . 

A1 

For  arbitrary  integral  operators  K in  L2(0,1],  one  has 


(5.15) 


M (K)  < 


1 1 

r / 

) 0 


K(s,t)  dsdt 


which  is  computationally  more  tractable  than  expressions  of  the  form  (5.14); 
however,  the  right-hand  side  of  inequality  (5.15)  may  overestimate  M(K) 
grossly. 

Fundamental  to  the  Anselone  theory  is  the  definition  of  the  numerical 
integration  operator  by 


(5.16)  Q x **  R [Kx]  , 

n n 

or 


n 

(5-17>  Cnx(s)  1 l K(s,t^)w.x(t,)  , 

for  the  given  nodes  and  weights  of  the  rule  of  numerical  integration  considered. 
A straightforward  application  of  approximation  theory  is  not  possible  at  this 
point,  as  the  operator  does  not  approximate  K in  the  operator  norm;  in 

fact  [4,  p.  90] 

(5.18)  M (K  - Q ) > M (K) 

n — 

independently  of  the  value  of  n.  However,  provided  that  the  set  of  numerical 
integration  operators  has  a technical  property  known  as  collective 

compactness  [1,  pp.  3-4],  error  estimates  may  be  obtained  in  terms  of  the 
operator  norm  of 
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(5.19) 


(Q  -K)K  = QK-K 
n n 


which  has  the  kernel 


(5.20) 


X 

A (s,t)  = l K(s,t.)w  K(t.,t)  - J K (s , r ) K (r , t)  dr  . 
j=l  J J D 0 


In  this  formulation,  the  Nystrom  approximation  z(s)  is  obtained  by  solving 
the  equation 


(5.21) 


(I  - XQ  )z  = y . 
n 


In  the  nonsingular  case,  (I  - \Q  ) exists,  and 

n 


(5.22) 


Furthermore  [1,  pp.  11-12),  if 


(I  - \Q  ) y 

n 


(5.23) 


6 = M ( (I  - \Q  )_1)M(X2A  ) < 1 , 

n n n 


then  (I  - Ak)  exists  (that  is,  the  original  integral  equation  (1.1)  has  a 
unique  solution  x(s)),  and 


(5.24) 


M(  (I  - XQn)  *)  HAQny-  AKy|l  + 6n||zl 

1 - 6 


As  the  property  of  collective  compactness  of  the  set  of  operators  {Q  } 

n 

and  the  fact  that  lim  6 =0  (including  a rate  of  convergence)  may  be  verified 

n-*® 

fairly  easily  for  convergent  numerical  integration  rules  commonly  used  in 

practice,  inequality  (5.24)  provides  an  error  bound  which  is  both  rigorous  and 

computable.  Some  difficulty  may  be  encountered  in  obtaining  precise  bounds, 

particularly  if  the  transformed  function  Ky  and  the  kernel  of  the  iterated 
2 

operator  K cannot  be  calculated  explicitly;  however,  satisfactory  overestimates 

for  the  quantities  on  the  right-hand  side  of  inequality  (5.24)  can  usually  be 

obtained.  It  is  worth  noting  that  if  K(s,t)  is  symmetric  (or  Hermitian) , 

then  it  follows  directly  from  (5.20)  that  A (s,t)  has  the  same  property. 

n 

This  may  be  useful  in  L_[0,1J,  as  sharper  bounds  for  the  operator  norm  M(A  ) 

2 n 

than  given  by  (5.15)  may  be  computable  in  terms  of  approximate  eigenvalues  of 

A (s,t)  . 
n 
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In  order  for  the  error  bound  given  by  (5.24)  to  bo  small,  the  function 

QnY  must  be  a good  approximation  to  the  transformed  function  Ky,  as  measured 

by  the  norm  of  their  difference  || Q y - Ky||,  and  the  finite  rank  operator 

n 2 

Q^K  must  be  close  to  the  iterated  operator  K in  the  operator  norm  topology. 
It  will  be  shown  belo..  that  the  Nystrom  method  can  also  be  formulated  in  terms 
of  finite  rank  operators  which  approximate  K directly,  leading  to 

simpler  error  bounds  than  (5.24). 

5.4.  Approximation  by  equations  of  finite  rank.  A standard  procedure 
in  the  numerical  analysis  of  the  integral  equation  (1.1)  is  to  approximate  it 

by  an  equation 

1 

(5.25)  z(s)  - X / L(s,t)z(t)dt  = y(s)  , 0 s 1 , 

0 

which  can  be  solved  for  z(s)  [15).  Here,  one  looks  for  estimates  of  ||x  - z|| 
in  terms  of  M(K  - L)  , M ( ( I - XL)  X),  and  perhaps  j|z||,  it  being  assumed 
that  these  values  or  upper  bounds  for  them  are  computable.  For  example,  in  the 
nonsingular  case  that  (I  - XK)  1 and  (I  - XL)  1 exist,  it  is  easy  to  verify 
the  identity 

(5.26)  (I  - XK)_1  - (I  - XL)"1  - X (I  - XI,)"1  (K  - L)  (I  - XK)"1  . 

Operating  on  y with  both  sides  of  (5.26)  gives 

(5.27)  x - z « X (I  - XL)"1(K  - L) x , 
and  thus 

(5.28)  ~|lx  IF^  ~ 'X'M((I  " XL>_1>M(K  - L> 

gives  a computable  bound  for  the  relative  error  ||x  - z||/||x||.  By  symmetry, 

(5.29)  ~ ^M((I  " Xk)_1)m(k  ' * 

which  requires  an  estimate  for  M((I  - XK)  1 ) to  be  computable.  However,  if 

(5.30)  0 (K  - L)  = M((I  - XL) _1)  M (K  - L)  < -jyj-  , 
then  (I  - XK)  1 exists,  and 


128 


IS.J1) 


M ( ( 1 - XK)'  S 


M((t  - XI.)  ) 

1 - | X 1 0 ( K - t.) 


(16,  pp.  50-57].  Substitution  of  (5.  11)  into  (5.2^)  yields 
(5.32)  II  x - z II 


b 

°,(K  ||  - II 

i - 

X 1 1>  ( K - t.)  ""'I' 

9 

the 

absolute  or  rot 

||x  - z||  . 

The  error 

bounds  (5. 2d)  and  (5.12)  are  simp  lor  than  (5.24). 

Tho  error  analysis  of  tho  Nystrom  mothod  conducted  here  will  bo  based  on 

approximation  of  (l.l)  by  equations  ('>.25)  with  kernels  of  tinito  rank,  that 

is,  !.(s,t)  - F (s,t),  whore 
n 


(5.13) 


F (s, t ) " ) u . (s)  v . (t ) , 

n j-1  3 3 


atwi  (u.  (s)  ,u_  (s)  , . . . ,u  (s)),  {v, (t) , v_ (t) , . . . , v (t)}  are  sots  of  linearly 
12  n 12  n 

independent  functions.  More  particularly,  tho  choice  u (s)  - K(s,t  ,)w,, 

3 j j 

j - l,2,...,n  will  be  made  for  tho  Nystrom  method,  which  leads  to  approximate 
kernels  of  the  form 


(5.34) 


K (s,t) 
n 


II 

£ K ( s , t ) w 
j-1  3 


jVj(t)  * 


ami  error  bounds  cor  resending  to  (5.20)  and  (5.32)  with  b - K^. 

Error  analysis  of  a number  of  methods  for  tho  numerical  solution  of 
integral  equations  can  be  carried  out  on  the  basis  of  approximation  by  finite 
rank  equations  with  kernels  (5.33)  or  (5.34),  including  col locat ion- interpolat ion 
procedures  which  have  tho  Nystrom  method  as  a special  case.  The  papers  by 
Phillips  (14],  Noble  (12],  ami  Sloan  (17]  describe  various  methods  for  which 
the  present  approach  is  suitable.  One  of  the  principal  results  of  tins  pa pel 
is  to  show  that  the  Nystrom  method  is  of  optimal  accuracy  with  respect  to  a 
class  of  approximations  of  tho  kernel  K(s,t)  by  finite  lank  kernels  of  the 
form  (5.34). 

6.  SOLUTION  OF  FINITE  RANK  EQUATION;- . In  1907,  Unit  sat  (0)  gave  tho 
recipe  for  solving  Fredholm  integral  equations  of  second  kind  with  finite 
rank  kernels  (5.33),  that  is,  equations  of  tho  fotm 


1 n 

(6.1)  s(s)  - A / ][  u.(s)v  (t)r.(t)dt  - y(s)  . 

0 j-1  3 3 
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In  terms  of  the  ordinary  inner  product 


1 

(6.2)  (f,g)  = / f(t)g(t)dt  « 

0 

which  will  be  assumed  to  be  defined  whether  or  not  the  function  space  considered 
is  a Hilbert  space,  equation  (6.1)  may  be  written 

n 

(6.3)  z (s)  - X £ u.(s)(v.,z>  «=>  y(s)  . 

j=l  3 3 

By  taking  the  inner  products  of  equation  (6.3)  with  v,  (s) ,v_ (s) , . . . ,v  (s) 

12  n 

in  turn,  one  obtains  the  finite  system  of  linear  algebraic  equations 

n 

(6.4)  (v.,z)  - X £ (v  ,u.Xv.,z>  =(v.,y>  , 

1 j=l  1 3 3 1 

i = 1,2, ...,n,  for  the  unknown  inner  products  ( v, , z >,(  v_, z >,...,( v ,z).  In 

12  n 

terms  of  the  solutions  of  (6.4),  the  solution  of  (6.3)  and  hence  of  the  rinite 
rank  integral  equation  (6.1)  may  be  written  as 

n 

(6.5)  z(s)  = y(s)  + X £ u.(s)(v  ,z>  . 

j*=l  3 3 

There  is  an  obvious  similarity  between  (6.3)  and  the  Nystrom  equation 

(2.5) ,  the  linear  algebraic  system  (6.4)  and  the  collocation  equations  (3.2), 
and  finally  between  the  solution  (6.5)  and  the  interpolation  formula  (4.1). 

These  similarities  will  be  exploited  below  to  obtain  error  bounds.  The 
principal  difference  between  the  two  sets  of  equations  is  that  the  Nystrom 
method  makes  use  of  interpolation  data,  that  is,  values  of  the  functions 
involved  at  specified  points,  while  the  inner  products  (6.2)  involved  in  the 
Goursat  equations  (6.3) -(6.5)  require  values  of  the  functions  considered  on 
the  entire  interval  0 t <_  1 (except  possibly  for  sets  of  measure  zero)  , 
which  will  be  called  approximation  data. 

In  the  nonsingular  case,  it  is  customary  to  express  the  solution  of  the 
integral  equation  (1.1)  as 

1 

(6.6)  x(s)  » y (s)  + X / T (s,t;X)y (t)dt  , 

0 
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where  I*(s,t;X)  is  called  the  resolvent  kernel  of  the  kernel  K(s,t).  The 


resolvent  kernel  G (s,t;X)  of  the  finite  rank  kernel 
n 

(5.33)  may  be  expressed  in  terms  of  the  inverse  matrix 
the  coefficient  matrix 


F (s,t)  defined  by 

n -1 
B - (B  ) - A of 


(6.7) 


A 


X >) 


of  the  linear  system  (6.4)  as 


n n 

(6.8)  G (s.tjX)  = l l u . (s) 8 v (t)  . 

n j-1  k-1  3 3*  * 


Equation  (6.8)  follows  directly  by  substitution  of  the  solutions 


n 

(6.9)  <Vj,z>  = l ®jk<Vk'y>*  3 " 1.2 n , 

of  the  system  (6.4)  into  (6.5).  There  are  a number  of  ways  to  express  the 
resolvent  kernel  (6.8)  as  a kernel  of  rank  n in  the  form  (6.33).  For 
example,  defining 


n 

(6.10)  V^t)  - l Bjkvk(t),  j - 1,2, ...,n  , 


one  obtains 


n 

(6.11)  G (s, t;X)  - 7 u.  (s)V. (t)  . 

n j-1  3 3 

Other  expressions  may  be  obtained  by  summing  over  j first,  or  by  manipulation 
of  the  matrix  B such  as  reduction  to  a canonical  form,  LU  or  singular 
value  decomposition,  etc.,  in  order  to  write  the  doublq  sum  in  (6.8)  as  a 
single  summation  of  products  of  functions  Uj(s)V^.{t),  j - 1,2,  ...,n.  In  some 
applications,  one  of  these  alternative  forms  may  be  more  useful  than  (6.11) . 

In  the  singular  case,  X is  an  eigenvalue  of  the  kernel  F^(s,t) , and 
the  homogeneous  system 

n 

(6.12)  <v.,z)  - X J (v  ,u  ><v  ,z>  - 0,  i - 1,2,. ..,n  , 

1 j-1  1 J J 
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corresponding  to  (6.4)  has  d linearly  independent  sets  of  solutions 

< , z >k><  v^,  k *■  1,2, ...,d,  where  d «■  n - r is  called  the 

defect  of  the  matrix  A of  coefficients  of  (6.12),  r being  its  rank . In 

terms  of  these  solutions  of  (6.12),  d linearly  independent  (right)  eigenfunctions 

z ,(s),z  (s)  , . . . , z . (s)  of  F (s,t)  are 
l i a n 

n 

(6.13)  z (s)  = X \ u.(s)<v.,z)  , k «■  1,2,  ...,d  . 

j « J 1 K 

3*>1  J 

Any  given  right  eigenfunction  of  F^(s,t)  may  bo  expressed  as  a linear  combina- 
tion of  the  functions  (6.13)  . 

By  the  Fredholm  theory  (6) , if  X is  an  eigenvalue,  then  the  transposed 
homogeneous  system 

n 

(6.14)  <p,u.>  - X £ < p,u . > <v . ,u . > “0,  j ° 1,2,. ..,n 

3 i“l  1 1 3 

has  d linearly  independent  sets  of  solutions  ( p,u  ) ,< p,u  >,...,< p,u  > , 

1 K * k n K 

k ■=  1,2,..., d,  corresponding  to  which 

n 

(6.15)  p^U)  “ * I < p,ui  >kvi  (t) , k «=  1,2, ...,d  , 

i«=l 

form  a complete  sot  of  linearly  independent  left  eigenfunctions  of  the  kernel 
F^(s, t) ; that  is,  any  solution  p(t)  of  the  transposed  homogeneous  integral 
equation 

1 n 

(6.16)  p(t)  - X / p(s)  £ u.  (s)v  (t)dt  *>  0,  0 £ t < 1 , 

0 i=l  1 1 

can  be  expressed  as  linear  combinations  of  the  functions  (6.15) . In  this  case, 
the  inhomogeneous  system  (6.4)  has  solutions  if  and  only  if 


(6.17) 


I <p,u  > <v.,y>  « 0,  k •»  1,2, . . . ,d  , 
i-1  X k 1 


or,  in  integral  form, 
1 

(6.18)  f 


J <p,u  > v (t) 
i“l 


y (t)dt  **  f p (t)y(t)dt  *■  0 
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k = 1,2,..., d.  In  other  words,  the  necessary  and  sufficient  condition  that 
(6.1)  be  solvable  in  case  X is  an  eigenvalue  is  that  y(t)  be  orthogonal  to 
all  solutions  of  the  transposed  homogeneous  integral  equation  (6.16).  Assum- 
ing that  this  holds,  given  a particular  solution  zQ(s)  of  (6.1),  the  general 
solution  may  bo  written 

d 

(6.19)  z(s)  = zQ(s)  + I akzk.(s)  ' 

k=l 

where  ai'a2'’'''ad  are  arl:>itrary* 

Thus,  approxima tion  of  the  integral  equation  (1.1)  by  a finite  rank  equa- 
tion (6.1)  provides  a way  to  construct  an  approximate  solution  and  resolvent 
kernel  in  the  nonsingular  case,  and  approximate  eigenvalues  and  left  and  right 
eigenfunctions  from  the  corresponding  homogeneous  equations.  To  handle  the 
inhomogeneous  equation  in  the  singular  case,  it  may  be  necessary  to  approximate 
also  the  right-hand  side  of  (1.1)  by  a function  which  satisfies  (6.18)  . 

7.  IDENTIFICATION  OF  THE  NYSTKOM  METHOD  WITH  FINITE  RANK  APPROXIMATIONS . 

On  the  basis  of  the  above,  it  can  be  seen  that  the  results  of  the  Nystrom 
method  are  identical  to  those  obtained  by  approximation  of  (1.1)  by  finite  rank 

equations  with  kernels  K (s,t)  of  the  form  (5.34),  provided  that  the  functions 

n 2 

v (t) ,v  (t) , . . . ,v  (t)  are  chosen  to  satisfy  the  n + n conditions 


1 


(7.1) 

< v . , K . ) *=  / v . (t)  K ( t , t , ) dt  =■  K (t . , t . ) , 

i 3 o 1 3 13 

i,j  B 1,2, ...,n, 

where  K^(t)  = K(t,t.),  and 

1 

(7.2) 

< v ,y>  *=  / v (t)y(t)dt  = y(t.)  , 

0 1 

1 c 1,2, ...,n. 

There  are  many  ways  to  find  functions  v^(t),  i = l,2,...,n,  which 
satisfy  (7.1)  and  (7.2).  For  example,  suppose  that  a reproducing  kernel 
R(s,t)  [3;  7,  pp.  146-160)  is  known  for  a space  containing  the  functions 
Kj(s)#  j ■ l»2,...,n,  (or,  more  generally,  the  functions  Kt(s)  = K(s,t) 
for  0 t £ 1)  and  y(s)  , then  one  may  take 

(7.3)  v^ft)  » R (t^ , t) , i 1,2,  ...,n  . 
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A less  exotic  way  to  determine  suitable  v,  (t) , v_ (t) , . . . , v (t)  would  be  as 

l i n 

linear  combinations  of  functions  for  which  the  integrals  in  (7.1)  and  (7.2) 
can  be  calculated  explicitly.  The  space  ft  of  functions  u>(t)  orthogonal 
to  K (t) .K^Ct) , . .. ,Kn(t) , and  y(t)  is  infinite,  with  codimension  at  most 
n + 1,  and  if  v^t),  i = l,2,...,n,  satisfy  the  linear  constraints  (7.1) 
and  (7.2),  then  so  do  the  functions  v.(t)  + w.(t)  for  arbitrary  u.  e D, 

ii  l 

i = 1,2, ...,n.  Thus,  given  the  integral  equation  (1.1)  and  the  rule  of 

numerical  integration  R , the  class  of  finite  rank  kernels  for  which  (7.1) 

n 

and  (7.2)  hold  will  be  denoted  by 

(7.4)  {K  (s,t)}  = {K  (s , t)  , R ,y(s)}  . 

n n 

A special  notation  will  be  used  for  the  homogeneous  case  y(s)  = 0,  namely 

(7.5)  {K  ( s , t ) } _ = {K (s , t) , R ,0}  , 

n 0 n 

for  which  K^fs.t)  has  to  satisfy  only  the  conditions  (7.1).  The  notations 

{K  } *>  { K, R ,/}  and  {K  = {k, R ,0}  will  be  used  for  the  corresponding 
n n n 0 n 

classes  of  finite  rank  linear  integral  operators  K with  kernels  K (s,t). 

n n 

8.  THE  NYSTROM  CONSTANT  AND  ERROR  BOUNDS.  As  approximation  of  the 
integral  equation  (1.1)  by  any  finite  rank  equation 


z(s)  - X / K (s , t) z (t) dt  = y (s) , 
0 n 


0 < s < 1 , 


with  kernel  K^fs.t)  chosen  from  (K^fs.t)}  gives  precisely  the  same  results 

as  the  Nystrom  method  with  the  rule  of  numerical  integration  R , the  accuracy 

n 

of  the  Nystrom  solution  can  be  studied  in  terms  of  how  well  the  integral 

operator  K can  be  approximated  by  finite  rank  operators  K with  kernels 

n 

K (s,t)  belonging  to  (k  (s,t)}.  To  this  end,  define 
n n 


v - v(K,Rn,y) 


inf  M (K  - K ) 
Kl(K)  n 


to  be  the  Nystrom  constant  for  the  given  integral  equation  and  rule  of  numerical 
integration.  In  the  homogeneous  case,  the  notation 

(8.3)  v.  - inf  M(K  - K ) 

V<Kn>0 
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will  be  used.  As  the  constraints  (7.2)  are  automatically  satisfied  in  the 

homogeneous  case,  one  has  { K } = {K,R  ,y)  C {k  for  arbitrary  y,  and  thus 

n n n 0 

(8.4)  v < \>  = v ( K , R ,y)  . 

o — n 

The  minimal  Nystrom  constant  vQ  is  appropriate  for  the  eigenvalue-eigenfunc- 
tion problem  for  K(s,t),  and  estimation  of  the  accuracy  with  which  the 
resolvent  kernel  T(s,t;X)  of  K(s,t)  can  be  approximated  by  resolvent  kernels 


r ( s , t ; X ) 

n 


l l K(s.t  )w  Bjkv  (t) 
g=l  k=l  J J J 


of  K (s,t)  belonging  to  {K  (s,t)}„,  as  the  function  y(s)  is  not  involved 
n n 0 

in  this  calculation. 


In  the  nonsingular  case,  for 


M ( (I  - Xk)"1) 


the  inequality 


follows  from  (5.29)  and  (8.2).  Hence,  the  accuracy  of  the  Nystrom  method  is 

optimal  with  respect  to  approximation  of  the  integral  operator  K by  fi ni  to 

rank  operators  K belonging  to  the  class  {K  }.  This  supports  the  observa- 
n n 

tion  that  good  results  are  usually  obtained  in  practice,  as  in  the  examples 

cited  by  Nystrom  [13]  and  Atkinson  [4,  pp.  102-104]. 

It  is  also  interesting  to  note  that  either  (I  - XK^)  1 exists  for  all 

K e (K  L or  X is  an  eigenvalue  of  all  the  kernels  K (s,t)  belonging  to 
n n 0 n 

(k  (s, t) }. . This  is  true  because  the  invertibil ity  of  I - Xk  is  equivalent 
n 0 • n 

by  construction  to  that  of  the  matrix  A given  by  (5.4)  for  all  kernels 

K (s,t)  f (K  (s,t)}„  (and  hence  for  all  K (s,t)  f (K(s,t),R  ,y(s)>  for 
n n 0 n n 

arbitrary  y(s)).  If  A = B exists,  then  the  classes  (K^l^  an<^ 

are  said  to  be  nonsingular.  A sufficient  condition  for  nonsingularity  of  these 


classes  is  that  (I  - Xk) 


exist,  and 


IMVo  " 1 • 
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The  Nystrom  method  determines  the  approximate  solution  z(s)  uniquely, 

but  not  the  approximate  resolvent  kernel  (8.5),  as  the  functions 

(t) , (t) , . . . »vn(t)  are  only  required  to  satisfy  (7.1).  In  terms  of  the 

resolvent  operators  T of  K and  T of  K , the  identity  (5.26)  may  be 

n n 

written 


(8.9) 


r - r - (i  + xr  ) (k  - k ) (i  + xr> 

n n n 


in  the  nonsingular  case.  Supposing  that  (8.8)  holds,  choose  £ > 0 and 


K € (K  K such  that 
n n u 


(8.10) 

Then, 

(8.11) 


M(K  - Ke)  < v + e < y . 
n - 0 |X|bq 


.e«-l 


m(i  + xt~)  « m((i  - xio  *)  < - — mv  v ~ Tn'  < 

n n — 1 - X | (vQ  + e)  Bq 


and,  from  (8.9), 

(8.12) 


M(r  - r ) < 


(V0  + c)Bo 


n - 1 - | X | (vQ  + e)B0 


Thus,  the  distance  y from  T to  the  class  {r  }.  of  resolvent  operators 

0 n 0 

of  the  finite  rank  operators  belonging  to  {iOq  satisfies 


(8.13) 


inf  M(r  - r ) < 


Vo 


r «(r  ). 

n n 0 


n - 1 - i x I v0b0 


in  the  operator  norm.  The  use  of  a resolvent  kernel  (8.5)  selected  from 
{rn(s,t;X)}Q  to  solve  equations  (8.1)  for  various  choices  of  y(s)  actually 
amounts  to  a modification  of  the  Nystrom  method  by  the  use  of  approximation  data 

(8.14)  yA  *»  (v^y),  i - 1,2, ...,n  , 


on  the  right-hand  side  of  (3.2)  instead  of  the  interpolation  data  y^  ■ y(t^), 

i ■ l,2,...,n.  The  two  sets  of  data  will  be  identical,  of  course,  if 

T (s,t;X)  is  the  resolvent  kernel  of  a kernel  K (s,t)  belonging  to  the  class 
n n 

{K  (s,t)}  - {k(s, t) ,R  ,y(s) }. 
n n 
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As  noted  above,  the  Nystrom  method  may  also  be  applied  to  the  homogeneous 
integral  equation 

1 

(8.15)  x(s)  - X / K(s,t)x(t)dt  = 0,  0 <_  s < 1 , 

0 

to  obtain  approximate  eigenvalues  and  eigenfunctions  of  K(s,t)  by  solving 
the  homogeneous  system 

n 

(8.16)  z.  - X £ K(t . , t . ) w . z . =0,  i - 1,2, ...,n  , 

1 j-1  3 3 3 

and  using  the  interpolation  formula 


n 

(8.17)  z (s)  *=X  £ K(s,t.)w.z. 

j=l  333 

for  the  corresponding  right  eigenfunctions.  By  construction,  all  the  kernels 

K (s,t)  in  the  class  {K  (s,t)}_  have  the  same  sets  of  eigenvalues 
n no 

xfn^  , . . . ,X  ^ and  corresponding  right  eigenfunctions  zfn^  (s)  , 

1 2 n 1 

*1°^ (s) , • • • # as ^ (s) , including  the  possibilities  of  multiplicity  of  eigenvalues 

2 n 

and  the  existence  of  generalized  eigenfunctions.  This  is  because  the  functions 

v, (t) ,v_(t) , . . . ,v  (t)  do  not  enter  into  these  calculations  explicitly, 
l 2 n 

It  is  also  possible  to  obtain  0(vQ)  error  estimates  for  |X  - X'  | 

and  || z - z(n)  ||,  for  example,  by  setting  up  the  eigenvalue-eigenfunction 

problem  as  a nonlinear  operator  equation,  and  use  of  the  Kantorovich  theorem 

(2]  or  some  other  technique  [12,  pp.  228-231] . As  these  are  outside  the  scope 

of  this  paper,  explicit  error  bounds  will  not  be  given  here. 

The  calculation  of  approximate  left  eigenfunctions  p^n*  (t)  .P^  (t)  * 

...,p^  (t)  is  carried  out  on  the  basis  of  the  transposed  homogeneous  system 
n 

n 

(8*18)  p.  - X J p,K(t, ,t ,)w,  ■ 0,  j ■ 1,2, ... ,n  , 

3 1 1 3 3 

and  the  interpolation  formula 

n 

(8.19)  P(t)  - X l p.v  (t) 

i-1  1 1 
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for  the  left  eigenfunctions  of  K (s,t),  which  are  hence  linear  combinations 

n 

of  v (t)  ,v  (t)  , . . . ,v  (t)  , and  depend  on  the  particular  kernel  chosen, 
i *£  n 

unlike  (8.17).  In  this  case,  the  inhomogeneous  equation  (8.1)  will  be 
solvable  without  further  approximation  of  y(s)  if 

(8.20)  <vi,y>  =0,  i = 1,2, ...,n  , 

or,  if  K (s,t)  e {K(s,t),R  ,y(s)},  one  has 
n n 

(8.21)  y(ti)  - 0,  i = 1,2, ... ,n  . 

Conditions  (8.20)  and  (8.21)  are  sufficient  that  (p,y)  = 0 for  all  solutions 
p(t)  of  the  transposed  homogeneous  equation 

1 

(8.22)  p(t)  - X / p (s) K (s, t)ds  =0,  0 < t <_  1 , 

0 n 

for  the  eigenvalue  X. 

9.  COMPUTABLE  ERROR  BOUNDS.  The  error  bounds  depending  on  the  Nystrom 

constant  given  by  (8.7)  or  (8.11),  for  example,  are  theoretical  in  character, 

as  the  unknown  (but  fixed)  quantity  BQ  is  involved.  However,  for  some 

given  operator  K e {K  },  the  quantities 
n n 

(9.1)  V = M(K  - K ),  B * M ( ( I - XK  )_1) 

n n n n 


i • ,.ie  computed  in  the  nonsingular  case,  and,  if  I X I v B < 1, 

' 1 n n 

exi.  r.s,  and  BQ  may  be  estimated  by 


(9.2) 


"jx  |v"  B 
1 1 n n 


1 - 


then  (I 


XK) 


as  follows  from  (5.31). 
upper  bound  for  v or 

(9.3) 


The  known  value  may,  of  course,  be  used  as  an 

vQ.  One  has  also 


v B 
n n 


directly  from  (5.28). 

Finding  the  Nystrom  constant  v (or  VQ)  requires  the  solution  of  a 
nonlinear  optimization  problem  subject  to  linear  constraints  (7.1)  and  (7.2) 
(or  (7.1)  only).  This  could  be  of  comparable  or  greater  difficulty  than  the 
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original  task  of  solving  the  linear  integral  equation  (1.1).  In  some  circum- 
stances, however,  it  might  be  desired  to  estimate  the  Nystrom  constant  directly, 

rather  than  use  a particular  kernel  K (s,t)  belonging  to  {K  (s,t)}  or 

n n 

(Kn ^ s , t ) }g . It  would  also  be  useful  if  the  estimation  process  furnishes 
information  concerning  good  choices  of  the  functions  v,  (t) ,v„  (t) , . . . ,v  (t)  . 

This  can  be  done  in  L2 (0,1]  by  classical  calculus  of  variations  technique 
applied  to  the  upper  bound  (5.15)  for  the  functional  M(K)  . Setting 
v •»  (Vl,v v >,  this  approximating  optimization  problem  may  be  posed  as 

1 1 2 

(9.4)  minimize  F[v]  «■  / / |K(s,t)  - K (s,t)|  dsdt  , 

0 0 n 

with  v subject  to  (7.1)  and  (7.2).  Here,  the  fully  constrained  case  will  be 

treated  in  detail,  with  appropriate  modifications  indicated  for  the  problem 

in  {K  (s,t)K  corresponding  to  only  the  constraints  (7.1),  and  also  the 
no 

unconstrained  problem  of  minimizing  F[v]  over  all  v with  v^  e L^lO.l], 
i » l,2,...,n.  Introducing  Lagrange  multipliers  A_  for  the  constraints 
(7.1)  and  A^  for  (7.2),  the  kernels  of  the  Gateaux  derivatives  of  the 
functional 

n n n 

(9.5)  *[v]  «=  F [v]  + l l A ( (v  ,K,  > - K (t , , t , ) ) + l A , ( <v  ,t>  - y(t.)) 

i-1  j*=l  13  1 ■>  13  i-1 

with  respect  to  v ,,v  , .. . ,v  will  vanish  at  the  solution  of  the  optimization 
12  n 

problem.  This  necessary  condition  is  equivalent  to  the  system  of  equations 
1 n 1 

(9.6)  -2  / K (s)K(s,t)w  ds  + 2 l (/  K. (s)K. (s)w  w ds)v. (t)  + 

0 1 1 j“l  0 1 3 13  3 

* X Vi'1’  * V(tl  1 ■ 

This  can  be  simplified  somewhat  by  writing 

1 

(9.7)  *j(t )-/  K (s)K(s,t)w  ds,  i - 1,2,... ,n  # 

"0 
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c..  = / K.  (s) K . (s) w . w .ds , i,j  = 1,2, ...,n  . 
^■3  n ^ 3 i 3 


Thus,  (9.6)  becomes 


n 1 n ^ 

(9.9)  l c v (t)  =Mt)  -i  [ A K (t)  - j A y(t)  , 

■‘■J  J AJ  J ^ 1 

i = 1,2,.. . ,n.  If  now  for  C = (c . . ) , one  has  that  C 1 = D = (d . . ) exists, 

13  13 


(9.10) 


Vi"'  X V jA"> 


1 

7 d..A.y(t),  i c 1 , 2 , . . . , n , 

2 . . IT  i 

j=l  J J 

which  expresses  the  functions  v (t) ,v.  (t) , . . . ,v  (t)  as  linear  combinations 

12  n 

of  the  unknown  Lagrange  multipliers,  with  known  functions  as  coefficients. 
The  unconstrained  solutions  v^(t)  of  (9.4),  i *=  l,2,...,n,  may  be  read 
directly  from  (9.10)  as 


(9.11) 


II  II  ± 

r.(t)  = j d..i|>.(t)  = 7 d / K.(s)K(s,t 

i . , 11  1 l]  „ 1 

j =1  J J J 0 J 


)w_.ds  , 


which  does  not  involve  the  Lagrange  multipliers.  The  kernel 


(9.12) 


K ( s , t ) = J"  K(s , t . ) w . v (t) 
n j'l  3 3 3 


is  the  best  approximation  to  K(s,t)  in  the  sense  of  (9.4),  not  necessarily 
in  the  operator  norm.  To  use  (9.12) , one  must  set  up  and  solve  the  linear 
system  (6.4) , which  requires  approximation  data  on  both  sides,  and  gives 
results  which  will  differ  from  the  Nystrom  solution,  except  in  special  cases. 
Linear  systems  of  equations  for  the  Lagrange  multipliers  may  be  obtained  by 
substituting  (9.10)  into  the  constraint  equations  (7.1)  and  (7.2).  From  (7.1), 


(9.13) 


K(vv  ■ lml  VW  • J X X WW  - 
1 n 

‘ 2 J,  dhj<y'Ki>Aj'  h,i  = 1,2"-"n  ' 
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and  (7.2)  becomes 


(9.14) 


y(ti'  ■ X Wy>  ■ 2 X Vvy>V 


“ \ l d <y,y>A  , i = 1,2, ... ,n  . 
z j=l  J 


If  one  sets  A = A = ...  = A =0  in  (9.13)  and  solves  for  At  . = A,  . , 
12  n h l h l 

h#i  = 1,2, ...,n,  the  corresponding  functions 


n . ii  it  _ 

(9.15)  v (t)  = l d . . / K. (s)K(s,t)w.ds  - — J l d. .AT  1C  (t) 

1 j=l  0 ^ ■*  2 j=l  k=i  3*  k 

give  the  kernel  K^(s,t)  which  minimizes  the  functional  F[v]  over 
^Kn^S,t^0‘  so^v*n9  (9.13)  and  (9.14)  for  the  Lagrange  multipliers,  one 

can  find  v^ (t) ,v^ (t) , . . . ,v  (t)  for  the  completely  constrained  problem. 

By  substitution,  the  estimates  vQ  <_  F[v°]  and  v < F[v]  can  be  obtained 
for  the  respective  Nystrom  constants. 
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NUMERICAL  SOLUTIONS  TO  THE 
LATERAL  STABILITY  OF  A MISSILE 
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Watervliet  Arsenal,  Watervliet,  NY  12189 

ABSTRACT.  Numerical  data  are  presented  in  this  paper  on  the  stability 
behavior  of  a free  flying  column  subjected  to  an  axial  thrust,  the  direc- 
tion of  which  can  be  adjusted  to  improve  the  stability  characteristics. 

This  is  a basic  problem  of  missile  design  for  lateral  stability  and  has 
not  been  fully  understood  prior  to  this  date.  The  results  obtained  by 
the  finite  element  method  are  plotted  showing  clearly  the  effects  on  the 
structure  in  various  types  of  instability  due  to  different  thrust  direc- 
tional control  parameters.  These  data  have  been  substantiated  by  a recent 
analysis  to  be  published  separately. 

1.  INTRODUCTION.  Numerical  results  are  presented  in  this  paper  on 
the  lateral  stability  of  a free-flying  Euler-Bernoulli  column  subjected  at 
one  end  to  an  axial  thrust,  the  direction  of  which  can  be  rotated  through 
a small  angle  about  the  tangent  of  the  column.  This  is  a basic  problem  of 
a flexible  missile,  many  aspects  of  which  have  not  been  fully  explored. 
Recently,  several  elusive  questions  on  the  solution  of  this  problem  have 
been  resolved  by  the  use  of  asymptotic  expansions  [1].  A brief  history 

of  this  problem  and  associated  difficulties  have  been  discussed  in  [1]. 

The  purpose  of  this  paper  is  to  present  some  additional  numerical  data  on 
this  fundamental  and  interesting  problem,  to  recapitulate  the  solution 
formulations  and  to  provide  some  physical  interpretations  of  the  solutions. 

2.  GOVERNING  EQUATIONS  AND  STABILITY  PARAMETERS.  The  differential 

equation  of  the  lateral  motion  of  an  Euler-Bernoulli  Warn  can  be  written  as 
the  following 


(EIu")"  + (Pu')'  + pAu  = 0 


(la) 


The  boundary  conditions  associated  with  a free-flying  column  subjected  to  a 
constant  thrust  with  directional  control  are 


EIu"  = (EIu")'  =0  at  x 
EIu"  = 0 


(EIu")'  - PK0u ' = 0 


at  x = «. 


(Ib.lc) 

(ld) 

(le) 
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In  Eqs.  (1),  u ■ u(x,t)  is  the  lateral  disturbance  of  the  column  from  its 
equilibrium  position,  a prime  (')  denotes  differentiation  with  respect  to 
the  spatial  coordinates  s,  a dot  (•)»  differentiation  with  respect  to  time 
t,  E is  the  Young's  Modulus,  p,  density  of  the  material,  I,  second  moment, 
A the  area  of  the  cross-section,  t length  of  the  column,  P is  the  axial 
thrust  acted  at  end  x ■ l and  Kg  is  a nondimcnsional  design  parameter 
indicating  the  amount  of  rotation  the  thrust  is  to  have.  It  will  be  shown 
in  this  paper  that  the  value  of  Kg  has  great  effect  on  the  stability 
behavior  of  the  column  under  consideration  (Figure  1). 

To  slmplfy  our  discussion,  we  shall  consider  a uniform  column  and 
Eqs.  (1)  will  be  nondimensional ized.  The  quantities  in  length  will  bo 
nondimensionalized  through  a division  by  t and  those  in  time,  through  a 
division  by  a constant  c,  where 

c - C»A‘V/2 
El 


(2) 


which  has  a unit  of  real  time.  Thus  Eqs.  (1)  become 

u""  ♦ Q(xu') ' ♦ u - 0 
u"(0)  * 0,  u"' (0)  * 0,  u"(l)  - 0 

u’"  (1)  - KeQu'(l)  - 0 

where  in  Eqs.  (3),  the  nondimensionalized  thrust  Q is  given  by 

Pi.* 

Q * -- 
y HI 


(3a) 
(3b, 3c, 3d) 
(3e) 

(4) 


For  vibrations  and  quasi-dynamic  stability  problems,  one  can  eliminate 
the  time  variable  by  assuming  that 

u(x,t)  =*  u(x)o^t  (5) 

Eq.  (la)  then  becomes 

u""  ♦ Q(xu') ' ♦ A*u  - 0 (3a') 

and  the  initial  conditions  do  not  enter  into  the  problem.  "Eqs.  (3')"  will 
be  used  to  refer  to  Eqs.  (3)  with  Eq.  (3a)  replaced  by  Eq.  (3a').  For  most 
of  the  results  of  this  paper  Eqs.  (3')  will  be  used.  However,  in  the  case 
of  repeated  eigenvalues  with  identical  eigenfunctions,  Tqs.  (3)  must  be 
used  to  find  another  independent  solution.  This  point  has  been  discussed 
in  reference  [1] . 
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It  is  clear  from  Eq.  (S)  that  the  parameter  X dictates  the  stability 
nature  of  the  problem.  A purely  imaginary  X indicates  stable  vibrations; 
a purely  real  and  positive  X,  instability  of  divergence  and  a complex  X 
with  positive  real  part,  instability  of  flutter.  Since  X appears  only  as 
X2  in  Eqs.  (3'),  it  is  equally  true  that  a real  negative  X2  indicates 
stable  vibrations;  a real  positive  X2,  instability  of  divergence;  and  a 
complex  X2,  instability  of  flutter. 


3.  FINITE  ELEMENTS  WITH  UNCONSTRAINED  VARIATIONAL  FORMULATIONS.  The 


concerning  nonself-adjoint  boundary  value  problems  [2].  Here  in  this 
section,  we  shall  apply  it  to  this  special  problem  missile  stability.  The 
key  to  this  solution  is  an  unconstrained,  adjoint  variational  principle. 
First,  we  shall  show  that  such  a principle  can  be  constructed  and  is  equiv- 
alent to  the  differential  equations  together  with  the  boundary  conditions. 
Second,  we  shall  show  that  this  principle  can  be  routinely  discretized  and 
lead  to  the  matrix  equation  required  for  the  numerical  solutions. 

In  conjunction  with  Eqs.  (3'),  an  adjoint  variable  v(x)  is  introduced 
in  the  following  variational  statement 


6J  =•  0 (6a) 

where 

J = J(u,v) 

1 

= / (u"v"  - Qxu'v'  + X2uv)dx 
0 


♦ Q(I  ♦ K0)u'(l)v(l)  (6b) 

Through  integrations  by  parts,  one  can  easily  establish  that  the  necessary 
and  sufficient  condition  for  Eqs.  (6)  is  the  original  boundary  value  problem 
of  Eq.  (3')  and  the  following  adjoint  boundary  value  problem 

V""  ♦ Q(xv') ' ♦ X2v  - 0 (7a) 

v"(0)  =»  0,  v'"  (0)  =■  0 (7b, 7c) 

v"(l)  ♦ Q(1  ♦ Kq)v(1)  3 0 (7d) 

v"'  (1)  «•  Qv'(l)  > 0 (7e) 


However,  in  so  far  as  the  original  problem  is  linear,  u(x)  and  v(x)  are  quite 
independent  of  each  other.  The  solution  of  u(x)  can  be  obtained  without 
solving  for  v(x).  Thus,  to  proceed  for  the  finite  element  matrix  equation, 
one  shall  take  variation  of  v(x)  alone  and  keep  u(x)  fixed. 
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Qxu'6v'  ♦ X2u6v)dx 


1 

(6J)  « 0 » J (u"<Sv"  - 

u 0 

♦ Q(1  ♦ K^u'CUfivCl)  (8) 

In  the  process  of  descritization,  the  column  is  divided  into  many  segments 
(elements).  Again  for  the  sake  of  simplicity,  these  segments  are  taken  to 
be  of  equal  length.  Let  L be  the  number  of  elements  and  the  local  coor- 
dinate 

5 - 5(1)  - L[x  - (i  - 1)/L]  (9) 

where  the  superscript  i indicates  the  i-th  element.  In  terms  of  Eq.  (8) 

becomes 


l l\ L’u^W1)"  - Q[5  . i - Du™ 

i=l  0 L 


4 Q( 1 + K0)u(L)'(l)6v(L)(l)  - 0 
Now,  introduce  the  generalized  coordinate  vectors 


(10) 


u2« 

-3(i)  -«> 

(Ha) 

v2(i) 

v;i), 

(Hb) 

and  the  displacement  function  vector 

aT(£)  =»  (1  - 3£2  ♦ 2ts  £ - 2£2  ♦ 

3^2  - 2£s  -e2  ♦ s*> 

(12) 

such  that 

u(i)(Q  - .T(OuW,  v 

(i)«)  =>  aT(OV(i) 

(13) 

The  superscript  T indicates  "transpose  of  a matrix".  Eqs.  (13)  are  substi- 
tuted into  (10)  to  yield 


L ,(i)T 


l 6V  ^ J (L*C  - Q[D  ♦ (i  - 1) B]  + - 

i-1  " ~ L 

♦ E =-  0 


X2  A}U(1) 


(14) 
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where 


1 ’p  1 >P  1 i' 

A ■ / aa  d£,  B * / a’a'  d£,  C - / a"a"  d£ 


D - / £aa  d , E - a(l)a'  (1) 

*Q  « 


(15) 


Through  the  use  of  requirement  that  the  displacement  and  slope  be  continuous 
at  the  nodes,  i.e. , 


us«l-«  - ux« 


„(!-!)  . „ CD 

4 2 


i * 2,3, ... ,L 


Eq.  (14)  can  be  written  as 


where 


(16) 


6VT(K  + 

X2M)U  » 

0 

(17) 

u2U)  Us(D  U4C1) 

u <2)  ....  u <» 

4 3 

u‘«» 

4 

IV  ^ 

V (1)  v (1)  v (1) 

2 3 4 

v (2) 

3 

V (2)  v (L) 

4 ••  3 

v (L)} 

4 

(18) 

The  global  stiffness  matrix  K and  inertia  matrix  M are  assembled  from  the 
local  matrices  of  Eqs.  (15)  through  standard  procedures  [3]. 


As  remarked  earlier  in  this  section,  6v  is  unconstrained  and  is 
independent  of  u.  Thus  6V  is  unconstrained  and  independent  of  U.  Eq.  (17) 
then  reduces  to 


(K  + A2M)U  - 0 (19) 

•*» 

which  is  the  final  matrix  equation  to  be  solved. 

4.  NUMERICAL  RESULTS  AND  INTERPRETATIONS.  Numerical  results  obtained 
by  the  formulations  described  in  the  previous  section  have  been  plotted  in 
Figures  2 through  6.  We  shall  discuss  their  significance  of  structural 
stability  in  several  categories. 
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4.1  The  Case  of  Zero  Thrust.  The  two  lowest  eigenvalues  of  bending 
mode  are  22.37  and  61.70  as  shown  in  Figures  2 through  6.  There  are  two 
rero  eigenvalues  (in  terms  of  X1).  One  corresponds  to  the  rigid  body 
translation,  the  other,  rigid  body  rotation.  As  the  thrust  increases  from 
zero,  the  rigid  body  translation  mode  is  going  to  remain  as  a valid  mode 
of  motion  regardless  of  the  value  of  Kg.  However,  the  rigid  body  rotation 
will  cease  to  be  a mode.  It  evolves  into  various  types  of  stable  and 
unstable  motion  depending  on  the  design  value  of  Kg  as  it  will  be  described 
in  sequel. 

4.2  The  Case  Kg  » 0.  The  thrust  in  this  case  is  a follower  force. 

As  it  increases  from  zero,  the  rigid  body  rotation  has  ceased  to  be  a mode. 

In  other  words,  it  is  no  more  a valid  eigenfunction  of  the  boundary  value 
problem  defined  by  Eqs.  (3').  Instead  it  degenerates  into  a rigid  body 
translation.  However,  the  rigid  body  rotation  does  exist,  not  as  a mode, 
but  together  with  a parametric  term  as  a solution  to  the  partial  differ- 
ential equation  and  boundary  conditions  of  Eqs.  (3).  This  has  been  discussed 
in  [1].  The  elastic  instability  occurs  only  as  the  third  and  the  fourth 
eigenvalue  branches  coalesce  as  Q has  reached  a value  of  11.126tt2  (Figure  2). 
The  eigenvalue  X2  becomes  complex  from  there  on,  meaning  the  onset  of  flutter 
instability. 

4.3  The  Case  Kg  > 0.  The  direction  of  the  thrust  is  controlled  so 
that  the  thrust  is  rotated  in  the  same  direction  as  the  slope  of  the  column. 
As  the  magnitude  of  the  thrust  increases  from  zero,  the  eigenvalue  X corres- 
ponding to  the  rigid  body  rotation  at  zero  thrust  becomes  imaginary,  mean- 
ing a vibratory  mode  (Figures  3 and  4).  This  new  node  becomes  the  lowest 
mode  of  bending,  and  it  will  stay  as  a vibratory  node  before  the  thrust  Q 
reaches  the  value  of  Qj  = 2.59Sn2.  At  Qj,  the  eigenvalue  becomes  zero 
again,  and  the  mode  shape  degenerates  into  rigid  body  translation  (not 
rotation).  The  rigid  body  rotation  motion  exists  only  with  a parametric 
excitation  as  the  case  mentioned  in  Subsection  4.2.  Thus  from  zero  thrust 
to  Qj , there  is  a range  of  elastic  stability.  Beyond  Qj , the  eigenvalue 

X becomes  real  and  positive,  meaning  divergence  instability.  In  other  words, 
the  column  will  buckle. 

4.4  The  Case  Kg  < 0.  The  direction  of  the  thrust  is  now  rotated  in 
the  opposite  sense  to  that  of  the  column's  slope.  As  the  thrust  Q increases 
from  zero,  one  of  zero  eigenvalues,  corresponding  to  the  rigid  body  rotation 
at  zero  thrust,  immediately  becomes  real  and  positive  - a case  of  buckling 
under  arbitrarily  small  thrust!  The  physical  implication  of  this  buckling 
instability  is  somewhat  unconventional.  It  appears  to  be  a unique  feature 
to  the  stability  problem  capable  of  rigid  body  notions.  (Figures  5 and  6.) 
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ABSTRACT . Aerodynamic  properties  of  artillery  shell  such  as  normal 
force  and  pitching  moment  reach  peak  values  in  a narrow  transonic  Mach 
number  range.  In  order  to  compute  these  quantities,  numerical  techniques 
have  been  developed  to  obtain  solutions  to  the  three-dimensional  transonic 
small  disturbance  equation  about  slender  bodies  at  angle  of  attack.  The 
computation  is  based  on  a plane  relaxa-ion  technique  involving  Fourier 
transforms  to  partially  decouple  the  three-dimensional  difference  equations. 
Particular  care  is  taken  to  assure  accurate  solutions  near  corners  found 
in  suell  designs.  Computed  surface  pressures  are  compared  to  experimental 
measurements  for  circular  arc  and  cone  cylinder  bodies  which  have  been 
selected  as  test  cases.  Computed  pitching  moments  are  compared  to  range 
measurements  for  a typical  projectile  shape. 

1.  INTRODUCTION.  When  designing  an  artillery  shell,  it  is  necessary  to 
develop  a vehicle  which  will  fly  with  stability  under  a wide  variety  of  aero- 
dynamic conditions.  A range  of  propellant  charges  may  be  used  giving  the 
shell  launch  velocities  covering  a spectrum  from  subsonic  to  supersonic.  The 
shell  will  also  slow  in  flight,  particularly  near  the  apex  of  its  trajectory. 

It  is,  therefore,  important  that  the  shell  fly  with  stability  in  subsonic, 
transonic,  and  supersonic  flight  regimes. 

Difficulties  are  often  experienced  by  projectiles  at  transonic  velocities. 
Aerodynamic  properties  such  as  drag  and  the  pitching  moment,  which  is  critical 
to  stability,  will  reach  peak  values  at  some  transonic  Mach  number.  This  peak 
can  form  in  a Mach  number  range  which  may  be  very  limited  as,  for  example, 
between  .92  < M < .94.  The  sharpness  of  this  critical  behavior  as  well  as  the 
value  of  the  critical  Mach  number  are  very  sensitive  to  body  geometry.  A 
slight  change  in  boattail  length  may  make  the  difference  between  a successful 
shell  and  one  whose  behavior  is  unpredictable. 

Aerodynamic  range  and  wind  tunnel  testing  are  difficult  and  expensive, 
particularly  at  transonic  velocities.  Tn  ref  ore,  it  is  of  great  importance 
for  artillery  projectile  design  to  develop  a computational  capability  which 
can  provide  guidance  in  choosing  shell  configurations  and  reduce  aerodynamic 
testing  requirements.  Techniques  have  been  established  and  computers  are  now 
available  which  should  make  possible  the  development  of  useful  computational 
design  tools,  particularly  for  the  limited  and  usually  simple  geometries 
found  in  artillery  projectile  shapes. 
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The  basic  approach  used  by  BRL  to  compute  flow  over  supersonic 
projectiles1  has  been  to  solve  for  the  inviscid  flow  field  around  the  body 
and  to  then  compute  the  flow  in  the  boundary  layer  using  turbulence  modeling. 
This  approach  seems  quite  feasible  for  use  in  the  transonic  problem  consid- 
ering the  results  shown  by  Schmidt,  Stock,  and  Fritz2  who  have  coupled 
integral  boundary  layer  calculations  to  transonic  inviscid  solutions.  The 
major  thrust  of  the  transonic  work  reported  here  involves  the  development 
of  numerical  techniques  for  the  computation  of  three-dimensional  transonic 
inviscid  flow  past  artillery  projectiles.  A three-dimensional,  compressible, 
turbulent,  boundary  layer  code  using  finite  difference  techniques  has  been 
developed  by  BRL1  for  use  with  supersonic  flow  and  it's  carry  over  should 
be  straight  forward. 

Complications  in  the  body  geometry  such  as  the  rotating  band  and  rifling 
have  been  neglected.  The  resulting  simplified  shape,  however,  exhibits  the 
basic  aerodynamic  properties  of  the  projectile.  One  complication  which  can- 
not be  ignored  is  the  presence  of  corners  at  the  junction  of  the  ogive  and 
cylindrical  portions  of  the  shell  and  at  the  junction  of  the  cylindrical 
portion  and  the  boattail.  These  corners  are  responsible  for  the  critical 
transonic  aerodynamic  behavior  of  the  shell. 

2.  COMPUTATION  TECHNIQUES.  Transonic  techniques  based  on  type  depen- 
dent differencing  have  come  into  wide  acceptance  since  they  were  developed  by 
Murman  and  Cole3.  There  are  now  several  schemes  like  that  developed  by 
Bailey  and  Ballhaus4  that  will  solve  three-dimensional  potential  flow  over 
wings  and  wing  body  combinations.  Considerable  simplification,  however,  may 
be  achieved  in  the  projectile  problem  if  the  code  is  restricted  to  axially 
symmetric  bodies.  Further,  by  using  a cylindrical  coordinate  system  which 
fits  the  body  surface,  no  vertical  or  horizontal  preferred  directions  are 
established.  This  provides  an  important  increase  in  accuracy.  The  axially 
symmetric  problem  at  angle  of  attack,  although  it  is  simpler  than  the  wing 
body  problem,  does  not  have  the  range  of  application  and  has  not  seen  as 
wide  an  interest.  There  has  been,  however,  some  recent  numerical  work  by 
Reyhner5  who  has  studied  axisymmetric  inlets. 

Even  though  techniques  are  available  to  solve  the  full  inviscid  potential 
equation  directly6,  the  approach  chosen  for  this  study  has  been  to  solve  the 
transonic  small  disturbance  equation 

[(1  - M2)  - M2  (y  + 1)  $z]  4>zz  + 4>rr  + *r/r  + *Q(/r2  = 0 , (1) 

which  is  an  approximation  to  the  full  equation.  This  is  a nonlinear  partial 
differential  equation  of  mixed  el liptic-hyperbolic  type  written  in  a cylin- 
drical coordinate  system  (z,r,9)  as  shown  in  Figure  1.  The  free  stream  Mach 
number  is  given  by  M in  this  equation  and  the  ratio  of  specific  heats  (1.4 
for  air)  is  represented  by  y.  The  solution  to  this  equation  has  been  shown 
by  Bailey7  to  give  good  results  for  the  slender  body  case  at  zero  angle  of 
attack.  This  equation  has  also  been  studied  both  numerically  and  analytically 
for  many  years  and  it  is  simple  enough  that  much  valuable  insight  may  be 
gained  from  it,  particularly  in  regions  near  the  body  where  it  reduces  to 
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+ Vr  + *ee/r2  = 

This  last  equation  is  a Laplace  equation  that  may  be  solved  analytically. 

In  his  two  dimensional  solution  of  equation  (1)  Bailey7  used  successive 
line  over  relaxation  in  which  difference  equations  were  solved  along  lines 
through  the  flow  extending  radially  from  the  body  axis.  This  line  relaxation 
procedure  has  been  carried  a step  further  in  the  technique  used  here  in  three 
dimensions.  The  difference  equations  for  whole  planes  of  flow  cutting  the 
body  axis  have  been  solved  simultaneously.  The  method  thus  obtained  treats 
coupling  in  the  r and  0 directions  with  a direct  solution  technique  which 
closely  matches  the  physical  coupling  found  in  the  flow.  Flow  disturbances 
propagate  much  more  strongly  in  the  r and  0 directions  than  along  the  body 
in  the  z direction.  This  may  be  seen  from  a look  at  the  inner  equation  (2) . 
There  is  no  z coupling  in  this  equation.  This  lack  of  disturbance  propagation 
along  the  body  is  a well  known  property  of  transonic  flow. 

The  matrix  equations  obtained  from  a line  relaxation  procedure  are  in 
tridiagonal  form  which  may  readily  be  solved.  The  matrix  equations  for  plane 
relaxation  are  not.  It  is,  however,  possible  to  make  use  of  the  periodic 
nature  of  tho  axisymmetric  problem  in  such  a way  as  to  transform  these  matrix 
equations  into  tridiagonal  form.  This  transformation  is  accomplished  by  a 
basis  change  which  is  equivalent  to  a Fourier  transformation  in  0.  Reyhner5 
has  pointed  out  that  the  solution  around  the  body  is  very  nearly 

4>  1 (r,z)  + <(>2  (r , z)  cos  0 

This  result  appears  very  clearly  in  the  Fourier  transform  approach  and  allows 
a considerable  saving  in  both  computer  time  and  storage  since  only  a few 
Fourier  components  need  be  treated.  The  use  of  Fourier  transforms  does 
increase,  to  some  extent,  the  problem  of  obtaining  a stable  relaxation  scheme. 
A simple  stable  scheme  can  be  obtained,  however,  which  reduces,  at  zero  angle 
of  attack,  to  the  usual  line  relaxation  algorithm. 

3.  DISCUSSION  OF  RESULTS.  The  results  for  computations  of  the  surface 
pressure  coefficient  for  bodies  with  circular  arc  profiles  can  be  seen  in 
Figures  2 and  3.  Figure  2 shows  a comparison  of  computed  and  wind  tunnel8 
pressure  coefficients  along  the  surface  of  a 1/10  fineness  ratio  body  at  zero 
angle  of  attack  in  a MacX  number  .99  free  stream.  The  location  of  a shock 
can  be  clearly  seen  just  aft  of  the  center  of  the  body. 

The  solid  line  shows  the  results  of  computations  for  a body  generated 
by  a perfect  circular  arc.  The  wind  tunnel  model,  however,  was  supported 
from  the  rear  by  a sting.  The  effect  of  the  sting  was  modeled  and  the 
resulting  computed  pressure  coefficients  are  shown  by  the  dashed  line  in 
this  figure.  As  the  angle  of  attack  is  zero  in  the  case  shown  in  Figure  2, 
the  computation  is  two  dimensional.  This  same  case  was  computed  by  Bailey7 
in  his  earlier  two-dimensional  work  and  the  results  are  identical. 
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Figure  3 shows  a comparison  of  computed  and  wind  tunnel9  pressures  for 
a slightly  more  slender  body  of  fineness  ratio  1/12.  The  Mach  number  in  this 
case  was  .90  which  is  too  low  to  allow  development  of  a large  supersonic 
region  with  strong  shocks.  The  figure  is  presented  to  show  the  result  of  a 
three-dimensional  computation.  Unfortunately,  wind  tunnel  data  for  cases 
showing  strong  shock  development  were  unavailable  at  angle  of  attack. 
Discrepancies  between  computed  and  wind  tunnel  pressure  over  the  aft  portion 
of  the  body  are  the  result  of  the  presence  of  the  wind  tunnel  sting. 

The  results  presented  in  these  two  figures  confirm  the  ability  of  the 
three-dimensional  code  to  predict  surface  pressures  over  smooth  bodies. 

There  is  little  difference  between  the  nose  of  a typical  artillery  shell 
which  is  an  ogive  and  the  front  portion  of  these  circular  arc  bodies. 
Artillery  shell,  however,  often  exhibit  corners,  particularly  at  the 
junction  between  the  ogive  and  cylinder  portions  and  between  the  cylinder 
portion  and  the  boattail.  Strong  shocks  are  formed  by  the  collapse  of  super- 
sonic regions  which  are  generated  by  the  expansion  of  the  flow  over  these 
comers  when  the  shell  is  flown  at  a slightly  subsonic  velocity  (.8  < M < 1). 
This  phenomena  is  demonstrated  in  Figure  4 which  shows  the  shadowgraph  of  a 
shell  at  the  critical  Mach  number  (M  * .926)  . Note  that  the  shock  on  the 
upper  surface  of  the  boattail  is  nearly  off  the  end.  The  high  pressure  on 
the  lower  surface  behind  the  shock  on  the  boattail  provides  a large  lift  on 
the  tail  creating  a powerful  overturning  moment.  The  flow  pattern  generated 
by  the  corner  at  the  beginning  of  the  boattail  is  largely  responsible  for 
the  critical  behavior  of  the  overturning  moment.  Thus,  the  accurate  treat- 
ment of  corner  flow  is  of  prime  consideration. 

Comers  create  singularities  in  the  potential.  The  well  known  incom- 
pressible result  is  that  flow  obtains  infinite  velocity  over  a comer.  The 
speed  of  the  flow,  however,  will  clearly  become  supersonic  before  it  becomes 
infinite  so  that  an  incompressible,  calculation  is  unacceptable.  The  comer 
problem  is,  therefore,  by  nature  transonic  and  can  be  treated  by  transonic 
techniques.  Also,  the  boundary  layer  which  is  well  developed  over  the  comer 
at  the  start  of  the  boattail  will  tend  to  effectively  round  this  corner  so 
that  a strict  mathematical  singularity  does  not  exist. 

The  ability  of  the  present  theory  to  predict  flow  over  a corner  can  be 
seen  in  Figure  5.  Figure  5 shows  a comparison  of  surface  pressure 
results  with  wind  tunnel  experiments19  for  flow  over  a cone  cylinder  model 
at  Mach  number  1.1.  The  theory  shows  reasonable  behavior  near  the  corner 
of  the  cone  and  cylinder  sections.  In  order  to  achieve  these  results  it 
was  necessary  to  use  care  in  applying  boundary  conditions  at  the  body  surface. 
An  approach  that  is  often  taken  is  to  use  solutions  of  the  simpler  inner 
equation  (2)  to  extrapolate  the  boundary  conditions  from  the  body  surface 
to  the  body  axis  or  to  some  other  convenient  location.  In  Bailey's  two 
dimensional  paper  the  boundary  conditions  were  extrapolated  to  the  axis 
where  a series  of  sources  and  sinks  were  placed.  The  source  and  sink 
strengths  were  obtained  from  the  solution  to  the  inner  equation. 
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This  procedure  is  not  feasible  if  accurate  corner  flow  is  to  be  obtained. 
The  inner  equation,  which  is  obtained  by  dropping  the  nonlinear  term  from 
equation  (1) , does  not  apply  near  corners  where,  in  fact,  the  nonlinear  term 
may  be  large  close  to  the  body  surface.  Boundary  conditions  must  be  applied 
directly  at  the  surface  without  extrapolation.  Further  improvement  in  the 
application  of  boundary  conditions  may  also  be  obtained.  The  usual  boundary 
condition  which  is  applied  at  the  body  surface  is  given  by 


$ 


r 


surface 


dR/dz  , 


where  the  left  hand  side  is  the  radial  derivative  of  the  potential  evaluated 
at  the  body  surface  and  the  right  hand  side  is  the  slope  of  the  body  surface. 
This  is  a first  order  approximation  to  the  body  surface  boundary  condition. 

A second  order  formula  is  more  appropriate  and  is  given  by 
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The  first  order  formula  works  well  as  long  as  remains  small  in  comparison 

to  1.  Near  a corner  may  become  large  enough  that  it  produces  a noticeable 

effect  as  seen  in  Figure  5.  Because  of  the  iterative  relaxation  procedure  used  in 
solving  the  potential  equation  4>,  may  be  obtained  at  an  old  iteration.  The  right 

hand  side  of  the  second  order  formula  may  thus  be  evaluated.  The  effect  of  using 
a finer  grid  spacing  may  also  be  seen  in  Figure  5.  The  grid  in  all  cases  shown  in 
this  figure  was  made  up  of  64  points.  In  the  case  labeled  fine  grid  in  this 
figure  these  points  were  clustered  so  as  to  give  twice  the  density  near  the 
comer.  Subsequent  calculations  have  been  carried  out  with  128  points  so  as 
to  achieve  this  same  fine  density  when  more  than  one  corner  is  present. 

The  absolute  necessity  of  the  care  taken  with  comer  flow  may  be  seen 
when  computations  of  normal  force  and  pitching  moment  are  made.  It  is 
possible  to  use  the  inner  solution  found  from  equation  (2)  to  predict  both 
normal  force  and  pitching  moment.  It  should  be  noted  that  there  is  no  Mach 
number  dependence  in  equation  (2).  Because  of  this,  both  normal  force  and 
pitching  moment  will  be  constants  in  Mach  number  and  will  depend  only  on  the 
body  shape.  Since  it  is  precisely  the  large  transonic  variations  of  normal 
force  and  pitching  moment  that  are  desired  and  since  the  use  of  equation  (2) 
will  predict  no  Mach  number  dependence  of  these  quantities,  equation  (2)  must 
not  be  valid  everywhere  on  the  surface  and  it  is  not  possible  to  use  equation 
(2)  to  supply  boundary  conditions.  A corollary  to  this  argument  is  that  it  is 
the  areas  where  equation  (2)  breaks  down  which  are  of  interest  in  obtaining 
variations  of  normal  force  and  pitching  moment  with  Mach  number.  Such  break- 
downs occur  near  corners  and  it  is  for  this  reason  that  they  are  of  such 
critical  interest.  'Breakdowns  also  occur  around  shocks.  As  seen  in  Figure  2, 
accurate  shock  location  is  also  vital  to  the  computation  of  aerodynamic 
quantities  because  of  the  large  pressure  difference  between  the  upper  and 
lower  surfaces  in  the  neighborhood  of  the  shock  on  the  boattail. 
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It  has  been  shown  in  the  above  discussion  that  the  transonic  techniques, 
that  have  been  developed,  will  predict  flow  over  both  smooth  bodies  and  bodies 
with  corners.  It  should,  therefore,  be  possible  to  obtain  the  solution  over 
a body  that  closely  resembles  an  artillery  shell.  The  lift  loading  computed 
for  a typical  artillery  shell  shape  is  plotted  in  Figure  6 . This  graph  shows 
the  normal  force  per  unit  length  plotted  as  a function  of  the  position  along 
the  shell.  It  is  felt  that  the  features  of  this  curve,  particularly  the 
large  downward  spike  in  the  boattail  region,  gives  an  accurate  representation 
of  the  aerodynamic  forces  on  this  body.  Comparison  of  the  pitching  moment 
coefficient  computed  for  this  body  and  range  measurements11  of  a similar 
shell  are  given  as  a function  of  Mach  number  in  Figure  7.  The  peak  shown  in 
the  computed  results  falls  a few  hundredths  of  a Mach  number  higher  than  the 
peak  in  the  range  measurements  and  is  not  as  pronounced. 

It  is  felt  that  this  situation  will  improve  with  the  inclusion  of  the 
boundary  layer,  the  inclusion  of  the  rotating  band,  and  a better  modeling 
of  the  wake.  These  improvements  will  be  added  in  the  very  near  future.  At 
present  it  is  felt  that  the  techniques  described  above  will  be  capable  of 
accurate  prediction  of  both  normal  force  and  pitching  moment  for  practical 
shell  configurations.  With  the  inclusion  of  a boundary  layer,  Magnus  forces 
may  also  be  predicted. 
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Figure  2.  Comparison  of  Calculated  Pressure  Coefficients 
With  Wind  Tunnel  Data  for  a Fineness  Ratio  1/10 
Circular  Arc  Body,  M = 0.99 
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Figure  3.  Comparison  of  Calculated  Pressure  Coefficients 
With  Wind  Tunnel  Data  for  a Fineness  Ratio  1/12 
Circular  Arc  Body  at  Angle  of  Attack,  a = 4°  , 

M = 0.90 
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Figure  4 


Spark  Shadowgraph  of  a Typical  Projectil 
at  Critical  Mach  Number,  M = .926,  a * 8' 
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Figure  5 


Comparison  of  Calculated  Pressure  Coefficient 
With  Wind  Tunnel  Data  for  a 7°  Half  Angle  Con 
Cylinder,  M = .99 


Figure  6.  Computed  Normal  Force  Loading 
Along  a Typical  Projectile 


Figure  7.  Comparison  of  Computed  Pitching  Moment 

With  Range  Data  for  A Typical  Projectile 
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NUMERICAL  SIMULATION  OF  ELECTRO-CHEMICAL  MACHINING 

Gunter  H.  Meyer 
School  of  Mathematics 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 

ABSTRACT . Anode  erosion  in  an  electrolytic  cell  can  be 
described  by  a free  boundary  problem  for  the  potential  equation. 

It  is  shown  that  the  method  of  lines  solution  technique  allows 
the  numerical  simulation  of  electro-chemical  machining  of  anodes 
for  a variety  of  tool  and  work  piece  shapes. 

1.  INTRODUCTION.  While  electro-chemical  machining  has 
become  a common  metal  forming  process,  its  numerical  simulation 
remains  a difficult  and  largely  unsolved  problem.  Yet  full  scale 
simulation  can  aid  in  choosing  tool  shapes  and  feed  rates  and 
thus  have  a direct  influence  on  the  design  and  utilization  of 
electro-forming  machinery.  The  lack  of  simulation  appears  to  be 
due  not  so  much  to  the  difficulty  of  building  a mathematical  model 
for  the  process  but  to  the  difficulty  of  solving  the  model  equa- 
tions for  realistic  applications. 

Ideally,  one  would  like  to  determine  the  tool  shape  neces- 
sary to  produce  a specified  work  piece.  Unfortunately,  such 
prediction  would  have  to  be  based  on  an  initial  value  problem 
for  the  potential  equation  which  is  mathematically  unstable.  To 
date  such  formulation  remains  practically  unsolvable.  On  the 
other  hand,  the  prediction  of  the  work  piece  shape  for  a given 
tool  leads  to  a stable  mathematical  boundary  value  problem  with 
a moving  surface  for  which  numerical  methods  are  now  becoming 
available . 

It  is  the  purpose  of  this  note  to  indicate  how  a fairly 
simple  numerical  method  for  the  solution  of  the  potential  equa- 
tion can  be  used  to  predict  the  shape  of  the  work  piece  under 
dynamic  working  conditions.  The  method  is  implicit  in  time  and 
yields  simultaneously  the  potential  in  the  electrolyte  and  the 
anode  surface.  Implicit  methods  as  a rule  allow  larger  time  steps 
and  more  irregular  anode  shapes  than  competing  explicit  methods 
where  the  anode  is  predicted  and  the  field  equation  is  solved 
in  the  predicted  gap  between  the  electrodes.  Moreover,  neither 
special  scaling  of  the  geometry  is  required  to  avoid  the  spurious 
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solutions  possible  in  the  integral  equation  method  of  [1],  nor 
is  the  method  restricted  to  nearly  circular  electrodes  as  re- 
quired for  a perturbation  approach  [4] . In  fact,  quite  arbitrary 
electrodes  can  be  examined  with  the  same  computer  program  so  that 
digital  simulation  appears  to  be  considerably  more  economical 
and  flexible  than  the  analog  simulation  outlined  in  [2] . 


2.  THE  MODEL  EQUATIONS  AND  THE  NUMERICAL  METHOD.  In  order  to 
demonstrate  the  numerical  technique  and  to  compare  our  results 
with  those  obtained  recently  by  two  different  methods  we  shall 
consider  the  case  of  a circular  cathode  surrounding  an  irregular 
anode.  Electrolyte  flows  in  the  axial  direction  at  sufficiently 
high  velocity  to  be  considered  of  constant  conductivity.  The 
electrodes  are  taken  to  be  equipotentials . Following  [1]  and  [4] 
we  write 

(2.1a)  V*VV  = 0 in  the  electrolyte 

(2.1b)  V = 0 on  the  circular  cathode  r = R 


(2.1c) 
(2. Id) 


V = 1 

<§?'£>  -™ 


on  the  anode 
on  the  anode. 


where  t is  a dimensionless  variable  scaled  according  to 


t = 


MV 

2 2 
a C 


x . 


Here  x is  the  actual  time,  V is  the  anode  potential,  M = — is 

the  machining  constant  depending  on  the  electro-chemical  equiva- 
lent e and  the  density  p of  the  anode  and  on  the  conductivity  o of 
the  electrolyte,  and  is  the  cathode  radius.  a is  a numerical 

scale  factor  of  no  importance  to  our  method.  An  initial  anode 
shape  r = s(0,O)  is  given.  It  is  assumed  that  throughout  the 
machining  process  the  anode  can  be  expressed  unambiguously  in 
polar  coordinates  as  r = s(0,t). 

The  method  of  lines  algorithm  introduced  in  [3]  is  suggested 
for  the  solution  of  the  free  surface  problem  (2.1).  We  shall 
advance  the  solution  (V(r,0),  s(0,t)}  in  discrete  time  steps  of 
length  At  by  discretizing  0 and  solving  (2.1)  as  a function  of 

r along  the  rays  8=0.=  i = 1,***,N.  As  is  common  in  the 

1 N t 

method  of  lines  we  write 


(2.2a) 


-(rV.'  ) ' 
r l 


2 2 
A0  r 


tVi+l  + 


i-1 


- 2Vi] 


0 


(2.2b) 


Vi(R)  = 0,  Vi(si)  = 1 
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where  is  the  approximate  potential  along  the  ray  0=0..  In 

order  to  find  the  motion  of  the  free  surface  in  the  radial 
direction  we  write  (2. Id)  as 


(2.2c) 


,3V  1 3V.  _ ,dr  d0 . 

3r  ' r 30'  dt  ' r dt' 


Since  r = s(0,t)  denotes  the  anode  surface  the  chain  rule  yields 


3r  dt  30  dt  3t  30  2 30  3t  ' 

r 

9 V 

It  is  convenient  (but  not  essential)  to  replace  ■*-=-  . Since  V=1 

> 30 

it  follows  that  the  tangential  derivative 

. ...  ,1  3s  , 3V  13V.  . , , . . 

D V = < (—  ,1),  (^r—  , — vx-)  > vanishes  on  the  anode.  Solving 

z r d«  drrdt) 

3V 

for  -gg-  and  substituting  into  (2.2c)  we  obtain  the  following 
expression  for  the  movement  of  the  anode  along  the  ith  ray 


3s 

at 


(1+ 


dr 


which  after  time  and  angle  discretization  leads  to 

s — s 

(2. 2d)  VV  (Si)  = (si-si  (t-At)  ) / At  ( 1 + {j7(-1"^-e-1-~-1-)  }2) 


The  numerical  solution  of  (2.2a,b,d)  is  based  on  the  algo- 
rithm described  in  [3]  which  is  included  here  for  the  sake  of 
completeness.  An  initial  guess  is  chosen  for  the  potential 
V^(r)  and  the  initial  anode  location  s^  where  i = 1,***,N.  This 

guess  is  usually  the  solution  from  the  preceding  time  level,  but 
any  reasonable  choice  will  work.  Then  the  one  dimensional 
equation 


(2.3a) 


V."  + V.' 

1 1 


+ 


1 

(r A0 ) 2 


tVi+l  + Vi-l 


2 V . ] 
1 


0 


(2.3b) 


(2.3c) 


V. (R)  = 0 

l 


Vsi>  * 1 


V <3i> 


Si  ” si 
At 


•a  • [^n 


on  the  interval  Is^R]  is  solved  for  V^r)  and  si  with  the  sweep 
method  given  below.  Once  is  found  it  is  used  to  replace 
V^(r)  according  to  the  formula 
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Vi(r)  " Vi(r)  + w[Vi(r)-Vi(r)] , 

where  u is  a relaxation  factor,  while  the  old  value  of  is 

replaced  by  that  just  found.  We  cycle  through  the  rays  until 
and  s^  remain  stationary  for  i = 1,***,N. 

The  solution  of  (2.3a,b,c)  is  based  on  the  so-called  method 
of  invariant  imbedding.  It  is  well  known  that  and  V/  are 
related  through  the  Riccati  transformation 

(2.4)  Vi  = U(r)Vi'  + w (r) 

where  U(r)  and  w(r)  are  the  solutions  of  the  following  well  de- 
fined initial  value  problems 

(2.5)  U'  = 1 + - ~ U2  , U (R)  = 0 

r (r A0 ; 

(2.6)  w'  = U(r)3  [2w-  V.  (r)  - V (r)]  , w(R)  = 0. 

(rA0p  1+1  1 1 

It  follows  from  (2.4)  and  (2.3b,c)  that  the  boundary  s^  must  be 
chosen  so  that 


1 = U(si) 


s . - s . ( t-At) 

l l 

At 


/k 


+ w(si) 


Hence,  as  the  equations  (2.5)  and  (2.6)  are  integrated  we  monitor 
the  functional 


<J>  (r) 


+ w (r ) - 1 


Where  it  passes  through  zero  the  anode  surface  s^  is  placed. 


3.  A NUMERICAL  EXAMPLE.  When  the  electrodes  are  concentric 
circles  a closed  form  solution  is  available  for  the  position  of 
the  anode  surface  [2].  Sample  computations  with  the  data  of  [1] 
gave  answers .within  fractions  of  one  percent  of  the  analytic  re- 
sult. For  example,  advancing  the  anode  from  an  initial  position 
s(0,O)  = 9.0  to  a final  position  s(0,2.5)  = 7.4374  in  10  equal 
time  steps  gave  a relative  error  of  .34%.  In  this  one  dimensional 
case  the  number  of  rays  does  not  influence  the  final  result.  In 
order  to  present  a more  complex  application  the  notching  of  an 
initially  circular  anode  due  to  a circular  cathode  was  computed. 
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It  is  assumed  in 
s ( 0 , 0 ) = 9.5  and 

anode  surface  is 


the  scaled  variables  of  (2.1)  that  R = 10, 
that  t €[0,2].  It  is  further  assumed  that  the 

masked  everywhere  except  for  0£(-^-  , ^-)  and 

O O 


0 €.  (^|—  , so  that  it  can  erode  only  in  these  two  segments. 

No  movement  across  the  bounding  rays  such  as  0 = ~ was  permitted 

0 

and  symmetry  about  0=0  and  0 = — was  used  to  restrict  the 


calculation  to  the  first  quadrant.  Fig.  1 shows  a typical  result 
when  the  equations  (2.5,6)  are  solved  exactly  as  described  in  [3]. 
In  order  to  accentuate  the  erosion  of  the  anode  in  the  plot,  the 

r * 

radius  is  scaled  according  to  r = in  (~a^— ) . The  computed 


depth  of  the  notch  at  0 = y and  t = 2 . 0 was  s ( j , 2 ) = 7.88. 


4.  EXTENSIONS.  The  algorithm  was  presented  for  the  model 
equations  (2.1)  to  allow  a simple  description  of  the  method  of 
lines  approach.  However,  the  algorithm  applies  to  general  field 
equations  subject  to  nearly  arbitrary  boundary  conditions.  Thus 
it  is  possible  to  use  electrodes  which  are  not  necessarily  equi- 
potentials,  to  account  for  electrolytes  with  variable  conducti- 
vity, and  to  model  in  more  detail  the  erosion  of  the  anode.  The 
method  is  also  applicable  in  three  dimensions.  These  extensions 
are  obtainable  for  moderate  cost  in  program  complexity  but  will 
significantly  affect  the  computation  times. 
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1.  Scaled  plot  of  the  anode  and  cathode  surfaces  during 

electro-chemical  erosion.  The  anode  is  plotted  after  every 
ten  time  steps. 

Ar  = 0.025,  A0  = ir/40,  At  = 0.04,  oi  = 1 . 3 . After  about  10 
SOR  iterations  per  time  step  the  anode  surface  and  the  poten 

tial  in  the  electrolyte  changed  by  less  than  10 
CDC  Cyber  74  computer  time:  100  secs. 
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ABSTRACT.  Ryad  is  the  popular  name  for  the  Unified  System,  the  Soviet  Bloc's 
upward  compatible  family  of  third  generation  mainframes  that  are  an  effective 
functional  reverse  engineering  of  the  IBM  S/360.  Although  backward  by  current 
Western  and  Japanese  standards,  the  Unified  System  is  of  considerable  technological 
and  economic  importance.  This  paper  is  a brief  introduction  to  the  Ryad  project. 


I am  a consultant  for  the  U.  S.  Army  Foreign  Science  and  Technology  Center 
on  the  subject  of  Soviet  computing.  This  brief  introduction  to  Ryad  is  part  of  a more 
comprehensive  study  of  the  Unified  System  project  that  I hove  been  working  on  with 
Norman  Davis  of  the  CIA  [Davis  and  Goodman  78],  The  views  expressed  here  do 
not  necessarily  reflect  official  opinion  or  policy  of  the  US  Army  or  the  CIA. 

During  the  last  10  years  the  USSR  and  its  CEMA*  allies  have  designed,  developed 
and  put  into  production  a family  of  upward  compatible  third  generation  computers  know) 
as  the  Unified  System  (ES)  or  Ryad**.  This  system  is  an  effective  functional  reverse 
engineering  of  the  IBM  S/360  series,  and  provides  Soviet  Bloc  users  with  unprecedented 
quantities  of  reasonably  good  general  purpose  hardware  and  software. 

In  keepiog  somewhat  with  the  theme  of  this  conference,  I will  try  to  direct 
this  short  paper  a Wit  towards  software  considerations. 

‘Council  for  Economic  Mutual  Assistance:  Bulgaria,  Czechoslovakia,  German  Democratic 
Republic  (GDR),  Hungary,  Poland  and  the  USSR.  Cuba,  Mongolia  and  Romania  have 
weaker  affiliations. 

**ES  is  a transliterated  abbreviation  of  the  Russian  for  Unified  System.  The  Cyrillic 
abbreviation,  EC,  and  an  alternate  transliteration,  YeS,  are  also  commonly  used. 
Language  differences  among  the  participating  countries  produce  other  variants;  for 
example,  the  Polish  abbreviation  is  JS.  Ryad  (alternate  transliteration:  Riad)  is  the 
Russian  word  for  "row"  or  "series".  The  prefix  R is  sometimes  used  to  designate  computer 
models. 


171 


The  poor  state  of  Soviet  and  East  European  hardware  before  Ryad  made  it 
difficult  to  develop  large,  modern  software  systems  for  general  purpose  computing. 
Of  several  major  handicaps,  three  are  particularly  important: 

(1)  Small  primary  memories.  Most  machines  were  provided 
with  no  more,  and  frequently  much  less,  than  32K  words  of 
poor  quality  core  memory. 

(2)  For  aM  practical  purposes,  disk  storage  was  not  widely 
available  until  1973.  Secondary  storage  was  generally  on 
poor  quality  magnetic  tape  units. 

(3)  There  was  a lack  of  suitable  and  reliable  peripherals. 

Card  readers  and  alphanumeric  printers  were  not  generally 
available  untilthe  mid-to-late  1960s.  The  units  that  were 
later  produced,  and  their  associated  paper  products,  were 
of  notoriously  poor  quality  and  reliability. 

This  situation  was  made  worse  by  hardware  vendor  practices.  They  delivered 
nearly  empty  machines.  Users  had  to  write  all  but  the  most  basic  utility  programs. 
Furthermore,  the  users  had  to  maintain  the  hardware  themselves.  This  eventually 
led  to  local  engineering  modifications  that  made  it  difficult  or  impossible  for  users 
with  the  same  basic  CPU  model  to  share  software. 

Because  of  these  and  other  factors,  the  pre-Ryad  software  situation  could  be 
summarized  as  follows; 

(a)  Software  existed  in  the  form  of  many  isolated  pockets 
of  machine  language  programs.  There  was  very  little 
portability. 

(b)  Computer  centers  were  essentially  on  their  own  once 
the  hardware  was  delivered. 

(c)  Many  applications,  especially  those  relating  to 
non-numeric  computing,  were  out  of  the  range  of  the 
hardware. 

(d)  Little  experience  had  been  built  up  in  the  development 
of  large,  modern  software  systems. 

(e)  Computers  were  not  accessible  to  users  who  had  not 
had  much  technical  training. 

During  the  early  60s  the  Soviets  began  to  appreciate  the  need  for  an  upward 
compatible  family  of  general  purpose  computers  for  the  management  of  assorted 
hierarchies  of  personnel  files  and  transportation  networks.  A Soviet  designed  series 
and  a first  attempt  to  duplicate  the  S/360  architecture  had  both  essentially  failed 
by  the  late  1960s. 

The  Unified  System  represents  a much  more  serious  attempt.  When  first 
announced  in  1967,  Ryad  was  a Soviet  project  that  was  almost  certainly  to  be  based 
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on  a Soviet  design.  The  USSR  had  good  political,  economic,  technical  and  military 
reasons  to  want  to  turn  this  into  a joint  CEMA  effort  that  would  provide  the  Soviet 
Bloc  with  a uniform  hardware  and  systems  software  base.  By  1969  the  Soviet  Union 
had  succeeded  in  coercing  and  cajoling  most  of  the  rest  of  CEMA  into  joining  the 
project.  The  Bulgarians  and  East  Germans  were  the  most  cooperative.  The  Poles, 
Hungarians,  and  Czechoslovakians  were  less  than  thrilled  with  the  plan.  Romania 
remains  a holdout;  it  has  little  more  than  a superficial  relationship  to  the  Ryad 
project. 

The  decision  to  copy  the  S/360  architecture  was  made  shortly  thereafter.  The 
GDR  was  the  major  advocate  of  this  course  of  action.  This  decision  was  primarily  a 
reflection  on  the  East-West  software  gap.  These  countries  had  not  had  much  exper- 
ience in  developing  large,  modern  software  systems  nor  had  they  accumulated  a 
large,  economically  significant  collection  of  appl ications  software.  The  plan  was 
to  expropriate  the  IBM  S/360  operating  systems.  Then  with  this  base  of  systems 
software,  they  would  be  in  a position  to  borrow  the  huge  quantities  of  other  systems 
and  applications  software  that  had  been  developed  over  the  years  by  IBM  and  its 
customers.  This  plan  has  been  followed  with  considerable  success  and  represents 
the  most  impressive  technology  transfer  in  Soviet  history. 

The  effort  put  into  the  development  and  production  of  the  Unified  System  was 
considerable.  Together  the  CEMA  countries  invested  the  efforts  of  70-80  research 
and  production  enterprises  and  over  300,000  scientists,  engineers  and  skilled  workers. 

Ryad  production  started  in  early  1972.  The  first  group  of  ES  models,  the 
Ryad-ls,  are  listed  below  along  with  their  IBM  S/360  equivalents.  Further  details 
are  provided  in  Table  I. 

ES  — 1 01 0 (Hungary)  - French  Mitra  15 

ES-1021  (Czechoslovakia)  - Production  model  of  EPOS-2 

ES-1 020  (Bulgaria,  USSR)  - 360/30 

ES-1030  (Poland,  USSR)  - 360/40 

ES-1 040  (GDR)  - 360/50  to  65 

ES-1 050  (USSR)  - 360/65  to  75 

ES-1 060  (USSR)  - 360/75  to  85 

The  1010  and  1021  are  not  part  of  the  "real"  upward  compatible  ES  family. 

The  Hungarian  1010  is  the  French  Mitra  15  minicomputer  produced  under  license 
(the  Mitra  15  is  itself  a licensed  version  of  the  SDS  Sigma  5).  The  Czech  1021  is 
based  on  a local  design  and  none  are  known  to  be  used  outside  of  Czechoslovakia. 

The  inclusion  of  these  two  machines  represents  a political  compromise  that  was 
necessary  to  get  some  kind  of  non-trivial  Ryad  participation  from  these  two  countries. 
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Transmission  rate  (K  byte/sec)  240  120-300  120-300  600  1200  1300  1300 

Multiplexor  channel 

Transmission  rate  in  40  35  1 0-16  40  110  110  150 

multiplex  mode  (K  byte/sec) 


Basic  peripheral  configurations** 
Magnetic  tape  units 
Magnetic  tape  control  units 
Magnetic  disk  units 


Selected  Characteristics  of  Ryad-1  Computer  Systems 
Table  I 


With  the  possible  exception  of  the  1040,  all  of  the  ES  models  had  severe  birth 
pains  and  their  introduction  was  spread  out  over  a longer  period  than  their  S/360 
counterparts.  Initial  production  rates  were  about  10%  of  those  for  the  S/360  and 
later  rates  rose  to  around  15-20%.  Not  surprisingly,  like  IBM  the  CEMA  countries 
had  their  worst  problems  at  the  high  end  of  the  line.  The  1050  did  not  appear  in  a 
viable  form  until  1976  and,  after  much  delay,  production  of  the  1060  was  finally 
announced  at  the  end  of  1977  (although  there  are  no  known  installations  as  of 
February,  1978). 

This  first  group  of  Ryads  has  since  been  followed  by  several  interim  models. 

These  are  shown  in  Table  II.  All  are  essentially  straightforward  upgrades  of  Ryad-1 
models  by  their  design  and  production  plants.  The  Poles  had  never  really  developed 
a production  version  of  their  1030;  they  dragged  their  heels  and  continued  to  devote 
their  attention  to  the  ICL-like  ODRA  project.  The  1032  is  compatible  with  the  main 
group  of  ES  models  and  represents  Poland's  real  entry  into  the  Unified  System  main- 
frame business. 

The  CEMA  countries  are  now  developing  a new  group  of  Ryad-2  models  that 
will  have  much  the  same  relationship  to  the  earlier  Ryads  as  the  IBM  S/370  has  to 
the  S/360.  Selected  design  parameters  for  the  new  machines  are  given  in  Table  III. 

New  features  to  be  available  in  the  new  members  of  the  Unified  System  include 
much  larger  primary  memory,  semiconductor  primary  memory,  virtual  storage, 
block  multiplexor  channels,  relocatable  control  storage,  improved  peripherals, 
and  expanded  system  timing  and  protection  facilities.  There  are  also  plans  for 
dual  processor  systems  and  extended  teleprocessing  capabilities.  By  early  1977 
most  of  the  new  models  were  well  into  the  design  stage.  The  appearance  of 
prototypes  and  the  initiation  of  serial  production  will  probably  be  scattered  over 
the  next  five  years.  Prototypes  for  the  1035  and  1055  may  appear  during  1978. 

The  Hungarians  plan  to  continue  with  the  development  of  new  minicomputers  under 
the  Unified  System  program. 

The  secondary  storage  and  peripherals  available  with  the  Ryad-1  and  interim 
models  were  similar  to  those  produced  by  IBM  in  the  mid-to-late  60s.  Peripheral 
reliability  is  still  not  up  to  S/360  standards.  Not  surprisingly,  disk  technology  has 
been  a major  problem.  Until  around  1976  , 7.25M  byte  disk  packs  were  the  only 
units  available.  Now  30M  byte  units  are  slowly  becoming  available,  but  are 
hardly  in  widespread  use.  Efforts  are  in  progress  to  master  100M  byte  unit  production. 
Surprisingly,  the  Bulgarians  have  concentrated  on  disk  technology  and  have  emerged 
as  the  Communist  world's  disk  specialists.  Even  the  East  Germans  have  essentially 
terminated  their  disk  development  effort  in  favor  of  the  Bulgarian  units.  The  USSR 
is  the  only  country  to  maintain  an  indigenous  capability  in  all  areas  of  secondary  storage 
and  peripheral  devices;  this  apparently  reflects  the  usual  conservative  paranoia  of 
the  Soviet  military.  Characteristics  of  commonly  used  secondary  storage  and  peripheral 
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Model 

ES-1022 

ES-1032 

ES-1033 

ES-1012 

Responsible  Country 

Bulgaria 

USSR 

Poland 

USSR 

Hungary 

Processor 

Operating  speed*  (K  opns/sec) 
Selected  performance  times  ( nsec) 

80 

200 

200 

Short  operations 

9 

2. 5-4.0 

1. 4-2.7 

2.6 

Floating  point  add/sub. 

30 

4.5 

4.5 

n/a 

Fixed  point  multiply 

80 

9.0 

8.5 

8.5 

Floating  point  divide 

100 

14.0 

17.7 

n/a 

Instruction  set 

Special 
109  Instr 

Principle  of  processor  control 

d : •-! 

KAZ 

Primary  memory 

Capacity  (K  bytes) 

128-512 

128-1024 

256-512 

8-64 

Cycle  time  ( t«.  sec) 

2.0 

1.2 

1.2 

1.0 

Length  of  accessed  word  (bytes) 

4 

4 

4 

2 

Channels 

Selector  channels 

Number 

2 

3 

3 

Transmission  rate  (K  byte/sec) 

500 

1100 

800 

Multiplexor  channel 

Transmission  rate  in 

40 

no 

70 

40 

multiplex  mode  (K  byte/sec) 


*See  Table  I 

Sources:  [Kamburelis  75,  Bratukhin  76,  GDR  76,  Budapest  77],  There  were  some  significant 
differences  among  the  numbers  given  by  these  sources. 

Selected  Characteristics  of  Interim  Ryad  Computer  Systems 
Table  II 
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Model  ES-1025  ES-1035  ES-1045  ES-1055  ES-1055  ES-1065 

(without  buffer)  (with  buffer) 

Responsible  Country  Czech  USSR  Poland  GDR  GDR  USSR 


Selected  Characteristics  of  Ryad-2  Computer  Systems 
Table  III 


devices  may  be  found  in  [CSTAC  I 78]  . New  devices  projected  for  the  Ryad-2 
models  are  described  in  [CSTAC  II  78]. 

In  spite  of  these  negative  comparisons  with  respect  to  IBM,  the  fact  remains 
that  the  Soviets  and  their  CEMA  partners  are  doing  well  with  respect  to  their  own 
past.  They  have  done  much  to  correct  the  major  data  processing  hardware 
deficiencies  mentioned  earlier. 

The  CEMA  countries  did  not  have  an  easy  time  adapting  the  IBM  operating 
systems  to  the  Ryad  hardware.  They  consistently  underestimate  the  difficulties 
associated  with  the  development  and  maintenance  of  software  even  more  than  we 
do.  They  got  DOS  up  reasonably  quickly,  but  had  a lot  more  trouble  with  OS. 

The  ES  hardware  is  not  as  fully  family  compatible  as  the  S/360  hardware  and  this 
may  have  necessitated  the  separate  adaption  of  DOS  and  OS  to  each  of  the  Ryad 
models  (excluding  the  1010  and  1021  of  course).  In  any  case,  the  effort  seems  to 
have  taken  them  more  time  than  it  took  IBM  to  build  these  operating  systems  in  the 
first  place.  Continued  maintenance  is  nowhere  near  Western  standards.  The  East 
Germans  and  Hungarians  seem  to  do  the  best  job  of  taking  care  of  their  systems 
software,  the  Soviets  the  worst. 

The  basic  Ryad  plan  was  very  conservative.  It  was  to  functionally  duplicate 
S/360  and  to  make  the  hardware  useful  to  the  general  economy  as  quickly  as 
possible.  Lots  of  resources  were  committed  and  priorities  came  down  from  the  highest 
Party  and  government  levels.  As  far  as  we  can  tell,  the  Soviet  Bloc  made  no  effort 
to  do  any  more  than  this  — for  example,  to  incorporate  S/370-like  features  into  the 
early  ES  models.  It  is  moot  to  speculate  as  to  whether  they  could  have  done  more. 

Actually,  it  is  amazing  that  they  were  able  to  succeed  as  well  as  they  have. 
The  problems  that  had  to  be  overcome  were  enormous.  There  were  language  barriers, 
the  difficulty  of  trying  to  duplicate  sophisticated  foreign  technology,  a small 
computer  industry  and  weak  support  industries,  poor  telecommunications  and  long 
physical  distances,  assorted  international  bad  feelings,  and  an  untested  control 
structure  supervising  many  development  and  production  facilities  that  had  not 
worked  together  before. 

The  effective  use  of  the  Ryad  computers  is  quite  another  matter  that's  the 
subject  of  other  papers.  It  will  suffice  to  note  here  that  assorted  systemic  factors 
(l.e.  those  that  relate  to  institutional  arrangements  and  economic  practices)  make 
the  Soviets  their  own  worst  enemy. 
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SOME  ALGORITHMS  FOR  THE  ANALYSIS  OF  COMPUTER  PROGRAMS* 

Lloyd  D.  Fosdick 
Department  of  Computer  Science 
University  of  Colorado  at  Boulder 
Boulder,  Colorado  80309 


ABSTRACT.  The  analysis  of  computer  programs  is  an  important  part  of 
program  translation,  error  detection,  optimization,  and  documentation.  It 
consists  of  two  distinct  activities:  the  construction  of  an  abstract  model 
of  a program,  given  the  program  itself  in  some  language  such  as  FORTRAN,  and 
the  extraction  of  information  from  the  program  by  examination  of  the  model.  A 
labeled,  directed  graph  is  a model  that  is  often  used.  In  recent  years  workers 
in  theoretical  computer  science  have  constructed  and  analyzed  algorithms  for 
solving  problems  on  labeled,  directed  graphs  which  are  directly  related  to 
important  problems  arising  in  the  analysis  of  computer  programs.  Some  of  these 
algorithms  and  their  applications  are  described.  The  discussion  does  not 
assume  a knowledge  of  graph  theory. 

1.  INTRODUCTION.  The  analysis  of  computer  programs  is  important  in  error 
detection,  optimization,  documentation,  translation  and  many  other  activities 
associated  with  the  design,  construction,  and  maintenance  of  software.  It  is 
difficult  because  of  the  size  and  complexity  of  programs.  Generally  speaking, 
program  analysis  has  been  an  unorganized  subject,  consisting  of  a variety  of 
ad  hoc  techniques  with  little  unifying  structure,  but  this  situation  is  chang- 
ing. One  reason  for  this  change  is  the  development  of  good  algorithms  for 
recognizing  the  implicit  relationships  in  directed  graphs.  I will  describe 
some  of  these  algorithms  and  how  they  can  be  used  on  problems  in  program 
analysis.  In  doing  so  I hope  to  provide  an  indication  of  how  we  can  begin 
to  organize  the  subject  of  program  analysis. 

I want  to  make  a careful  distinction  between  the  analysis  of  algorithms 
[1]  and  the  analysis  of  programs.  The  analysis  of  an  algorithm  starts  with 
the  algorithm  and  focuses  on  the  problem  of  obtaining  time  and  memory  space 
bounds  or  expectation  values  in  terms  of  a few  parameters  of  the  underlying 
problem.  The  analysis  of  a program  starts  with  a program,  say  in  FORTRAN  or 
COBOL,  and  proceeds  to  some  abstract  model  of  it  on  which  the  analysis  is  per- 
formed. The  "program"  is  expressed  exactly  as  it  will  be,  or  has  been,  read 
into  the  store  of  the  computer.  The  construction  of  the  abstract  model  of  the 
program  is  a critical  step  in  the  analysis  of  programs,  critical  because 
decisions  must  be  made  about  which  information  is  to  be  discarded  and  which 
is  to  be  retained,  and  the  analysis  will  suffer  if  too  much  or  too  little  is 
discarded.  Program  analysis  is  concerned  with  time  and  memory  space  require- 
ments but  it  is  also  concerned  with  far  more  detailed  information  than  is 
algorithm  analysis;  for  example,  program  analysis  is  concerned  with  which 
variables  are  assigned  values  in  certain  areas  of  the  program,  which  sub- 
routines are  called  by  which  other  subroutines,  which  variables  depend  on  which 
other  variables,  and  so  forth.  Finally,  because  of  the  large  amount  of  Infor- 
mation that  arises  In  program  analysis,  most  of  the  work,  If  not  all,  is  done 
by  computers  whereas  the  analysis  of  algorithms  is  done  by  humans. 
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2.  PROGRAM  ABSTRACTIONS.  The  basic  structure  for  a program  abstraction 
is  a directed  graph  [2].  The  subject  of  graphs  dates  back  to  Euler  and  the 
famous  Konigsberg  bridge  problem  [3].  Graphs  appeared  in  automatic  computing 
at  an  early  date  [4]  and  since  then  they  have  been  used  extensively  to  repre- 
sent machines,  programs,  data  structures  and  problems  attacked  by  computers 
such  as  network  flow  problems.  A directed  graph  is  shown  in  Fig.  2.1.  It  is 
denoted  by  G(N,E)  where  N is  a set  of  points  called  nodes,  joined  by  a set,  E, 
of  directed  lines,  called  edges.  Letters  or  numbers  are  used  to  identify  nodes 
as  in 

N = {a,  b,  c,  d,  e,  f}  or  N = {1,  2,  3,  4,  5,  6} 

and  ordered  pairs  of  nodes  are  used  to  represent  edges,  as  in 

E = { (a ,b) , (b,c) , (b,d) , (d,e),  (e,d),  (d,f),  (c,f)} 

to  represent  the  set  of  edges  of  the  graph  in  Fig.  2.1.  The  edge  (a,b)  is 
said  to  leave  a and  enter  b.  The  notation  Si  is  used  to  represent  the  number 
of  elements  in  the  set  S;  ; N ' = 6 and  |Ej  = 7 for  the  graph  shown  in  Fig.  2.1. 

A path  in  a graph  is  a sequence  of  nodes  joined  by  edges:  for  example 

a -*•  b -v  d -»■  e and  b -*■  c -*•  f are  paths  in  Fig.  2.1.  Sometimes  intermediate 

nodes  along  the  path  are  suppressed  and  the  notation  a * e is  used  to  denote  a 
path  from  node  a to  node  e.  The  length  of  a path  is  the  number  of  edges  on 
the  path:  the  length  ofa>b  + c-*-fis  three.  A path  in  which  the  first  and 
last  nodes  are  the  same  is  a cycle;  for  example,  d -*■  e -*  d and  e-*d-*-e-*-d-*-e 
are  cycles  in  Fig.  2.1.  A path  such  as  a b -►  d -<■  e ■»  d in  Fig.  2.1  is  said 

to  contain  a cycle.  If  no  path  in  the  graph  is  a cycle  the  graph  is  called 

acyclic.  If  every  node,  save  one  called  the  root,  has  exactly  one  edge  enter- 
ing it  and  the  root  has  no  edges  entering  it,  then  the  graph  is  a tree.  A 

tree  is  an  acyclic  graph  but  the  converse  is  not  true.  A picture  of  a tree  and 

an  acyclic  graph  which  is  not  a tree  are  shown  in  Fig.  2.2.  If  (i,j)  is  an 
edge  in  a tree  then  node  i is  called  the  parent  of  node  j. 

It  is  common  practice  to  represent  control  flow  with  a graph  called  a 
flow  graph.  In  such  a representation  the  nodes  identify  statements  and  the 
edges  identify  the  order  of  execution  of  the  statements.  When  one  considers 
creating  such  an  abstraction  mechanically  it  becomes  immediately  evident  that 
a simple  one-to-one  mapping  of  statements  to  nodes  is  not  always  adequate.  An 

example  of  the  kind  of  problem  that  can  be  encountered  is  illustrated  with  the 

DO  statement  in  FORTRAN  which  actually  consists  of  three,  more  elementary, 
statements  separated  by  a group  of  statements  following  the  DO.  This  is  illus- 
trated in  Fig.  2.3.  Even  though  a node  in  a flow  graph  may  not  represent  a 
statement  exactly  because  of  such  complications  it  is  convenient,  and  should 
cause  no  confusion  here,  to  speak  of  nodes  in  flow  graphs  as  representing 
statements.  A node  in  a flow  graph  with  no  edges  entering  it  is  called  an 
entry  node,  and  a node  with  no  edges  leaving  it  is  called  an  exit  node.  A sub- 
routine in  ANS  FORTRAN  would  be  represented  by  a flow  graph  with  one  entry 
node,  corresponding  to  the  first  executable  statement  and  one  or  more  exit 
nodes  corresponding  to  RETURN  statements.  It  is  customary  to  restrict  flow 
graphs  to  have  a single  entry  node  and  to  represent  them  with  the  notation 
G(N,E,n0)  where  n0,  an  element  of  the  set  N,  is  the  single  entry  node. 

The  nodes  of  a flow  graph  may  represent  larger  structures  than  individual 

statements:  they  may  represent  statement  blocks,  cycles,  subroutines  and  so 
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forth.  Each  of  these  is  a different  program  abstraction  retaining  some  aspects 
of  control  flow  and  suppressing  others.  A statement  block  is  a sequence  of 
statements  such  that  control  always  flows  from  one  immediately  to  the  next; 
thus  a branch  statement  can  only  appear  at  the  end  of  a block  and  only  the 
first  statement  can  be  branched  to  by  some  other  statement.  The  transforma- 
tion of  a flow  graph  in  which  nodes  represent  statements  to  one  in  which  nodes 
represent  statement  blocks  is  illustrated  in  Fig.  2.4.  The  resulting  graph 
is  more  compact  but  still  shows  all  branches  in  control  flow  and  all  cycles. 

The  transformation  of  such  a graph  into  one  in  which  cycles  are  suppressed  by 
collecting  them  into  single  nodes  is  illustrated  in  Fig.  2.5.  Here  cycle 
information  is  lost  but  connectivity  relationships  between  cycles  are  retained. 
In  graph  terminology  the  nodes  of  the  resulting  flow  graph  represent  the 
strongly  connected  components  of  the  original  flow  graph.  When  the  nodes  of 
a flow  graph  represent  subroutines  or  procedures  and  the  edges  represent  one 
or  more  procedure  calls,  the  graph  is  named  a call  graph.  This  is  illustrated 
in  Fig.  2.6.  Since  FORTRAN  prohibits  recursion  a call  graph  for  a FORTRAN 
program  must  be  acyclic.  The  use  of  a call  graph  together  with  flow  graphs 
for  individual  subroutines  naturally  partitions  the  abstract  representation 
of  a program  into  more  manageable  form  for  analysis. 

Given  a flow  graph  the  following  questions  may  be  asked  about  its  struc- 
ture; 

1 ) Is  it  acyclic? 

2)  What  nodes  lie  on  some  path  from  a given  node? 

3)  Is  it  possible  to  construct  a path  which  includes  a given  set 
of  nodes  (edges)? 

4)  Can  you  find  a path  from  the  entry  node  to  an  exit  node  which 
does  not  include  both  members  of  certain  pairs  of  edges? 

5)  Which  sets  of  nodes  have  the  property  that  there  is  a path  from 
any  node  to  any  other  node  of  the  set? 

These  questions  and  others  are  related  to  important  questions  that  may  be  asked 
about  a program: 

1')  Is  it  possible  for  this  program  to  loop  indefinitely  during 
execution? 

2')  Can  statement  s^  be  executed  before  statement  Sj? 

3')  What  is  the  smallest  set  of  test  data  needed  to  execute  every 
statement  (traverse  every  edge)  once? 

4')  Can  you  find  an  executable  sequence  of  statements? 

5')  Can  statement  si  and  Sj  be  in  a cycle? 

The  questions  about  graph  structure  clearly  can  be  answered  by  enumera- 
tion since  the  graphs  are  finite,  but  graphs  derived  from  programs  may  be 
large  and  enumeration  impractical.  Therefore  it  is  important  to  find  algo- 
rithms which  are  significantly  faster  than  enumeration  and  it  is  important  to 
know  about  the  complexity  of  these  problems.  It  is  known,  for  example,  that 
question  4 is  a problem  that  is  called  NP-complete  [5]:  this  means  that  the 
worst  case  execution  time  of  any  algorithm  for  solving  this  problem  on 
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arbitrary  graphs  is  almost  certainly  going  to  be  exponential  in  the  size  of 
the  graph  as  measured  by  |N|  and  |E|.  Other  questions  in  this  group  can  be 
answered  by  very  fast  algorithms;  for  instance  question  5 can  be  answered  by 
an  algorithm  that  has  time  complexity  0(|N|+|E|).* 

Association  of  information  describing  data  actions  with  the  nodes  of  a 
flow  graph  makes  it  possible  to  analyze  sequences  of  data  actions,  called  data 
flow.  For  example,  consider  a particular  variable,  say  X,  and  label  each  node 
with  the  symbol  r,  d,  or  e according  as  X is  referenced  (its  value  is  fetched 
from  memory  as  in  Y X+l),  defined  (a  value  is  assigned  to  X as  in  X ■*-  Y+l ) , 
or  X is  not  referenced  or  defined.  If  more  than  one  action  on  the  variable 
takes  place  as  in  X +-  X+l  then  an  appropriate  sequence  of  action  symbols,  rd, 
is  attached  to  the  node.  An  example  is  shown  in  Fig.  2.7:  every  path  in  this 
labeled  graph  corresponds  to  a sequence  of  data  actions  as  illustrated  below 
for  the  variables  X and  Y. 

Path  Data  Actions 

l-^2-*-3-*-&+7  e d r e (X) 

e e r e e (Y) 

1 ->-2-*3-*4-*6->-4->7  edrrrdre  (Y) 

e e r r e r e (Y) 

Looking  at  the  data  flow  described  on  the  right  it  is  evident  that  Y must  be 
assigned  a value  before  entry  to  the  program  since  there  are  paths  on  which 
the  first  action  is  r.  It  also  appears  that  X need  not  be  assigned  a value 
before  entering  the  path  since  it  is  defined  before  any  reference;  further 
consideration  shows  that  X does  not  need  to  be  assigned  a value  before  enter- 
ing any  path  starting  at  nQ. 

Analysis  of  data  flow  can  provide  answers  to  the  following  questions  about 
a program: 

1)  Are  undefined  variables  referenced? 

2)  Are  there  unnecessary  definitions  of  values? 

3)  Which  parameters  in  a procedure  call  need  to  be  assigned  values 
before  entry? 

4)  Which  parameters  in  a procedure  call  may  have  altered  values  upon 
return? 


*Let  f ( | N | , | E | ) represent  the  time  to  execute  the  algorithm  as  a 
| N | and  |E|,  then  time  complexity  0(|N|+|E|)  means 


1 im 

MEh 


ff( 

N , E )} 

h 

|n|+|e|  J 

= k. 


k a constant  unequal  to 


complexity  0(|N|+|E|)  implies  that  the  approximation  f(lN|,|E|)  = 
may  be  used  for  large  |N|+|E|. 


function  of 


zero.  Time 
k x ( )N|+|E| ) 
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Analysis  of  data  flow  can  detect  programming  errors.  If  the  first  data  action 
on  a path  is  r then  it  is  likely  that  a data  initialization  has  been  omitted, 
or  the  reference  action  is  incorrect  because  the  variable  was  misspelled,  or 
the  referencing  statement  is  in  the  wrong  place,  and  so  forth.  Similarly  if 
a d is  followed  by  a d without  an  intervening  r then  a variable  may  have  been 
misspelled  causing  the  redundant  definition  or  causing  the  omission  of  the 
intervening  reference.  Analysis  of  data  flow  can  also  assist  in  the  global 
optimization  of  programs,  in  providing  documentation  aids,  and  in  program  modi- 
f ication. 

It  should  be  evident  from  this  that  algorithms  for  manipulating  graphs 
and  for  extracting  implied  relationships  in  graphs  have  many  applications  in 
program  analysis.  However,  not  all  useful  abstractions  require  graphs.  In 
some  applications,  for  example,  sets  of  variables  are  sufficient:  the  set  of 
variables  declared  as  formal  parameters  to  a procedure  should  be  a subset  of 
the  set  of  variables  used  in  a procedure;  sets  of  variables  which  are  equivalent 
in  the  sense  that  all  variables  in  the  set  represent  the  same  memory  location 
are  important  for  a consideration  of  aliasing.  Thus  algorithms  for  set  opera- 
tions are  also  important  for  program  analysis.  Mere,  however,  our  attention 
is  directed  at  graph  algorithms. 

3.  ALGORITHMS.  A flow  graph  contains  explicit  and  implicit  information. 
Explicit  information  is  information  associated  with  a node  or  edge  which  is 
independent  of  information  at  other  nodes  or  edges.  Thus  it  is  local  informa- 
tion about  the  program,  determined  at  a cost  that  is  independent  of  the  program 
size.  Examples  of  explicit  information  are:  the  statement  type;  the  list  of 
variables  appearing  in  a statement;  the  branch  condition  on  an  edge,  for  exam- 
ple the  fact  that  A :*  0 is  true  if  the  edge  is  traversed  during  execution. 
Implicit  information  may  be  associated  with  a node  or  edge  also,  or  it  may  be 
associated  with  a larger  structure  including  the  entire  flow  graph:  it  is 
dependent  on  information  at  more  than  one  node  or  edge,  perhaps  all  of  them. 

Thus  implicit  information  is  global  information  that  may,  and  usually  does, 
depend  on  the  program's  size.  Examples  of  implicit  information  are:  the  set 
of  statements  which  can  be  reached  on  all  paths  from  a given  statement;  the 
set  of  variables  on  which  the  first  data  action  will  be  definition  on  all  paths 
from  a given  statement;  the  set  of  statements  which  are  on  all  paths  to  a par- 
ticular statement.  Explicit  information  is  collected  at  the  time  the  abstract 
model  of  the  program  is  created.  It  is,  in  fact,  part  of  the  model  itself. 
Implicit  information  is  derived  from  the  model.  It  is  the  derivation  of  this 
implicit  information  that  is  the  focus  of  attention  in  the  subsequent  discus- 
sion. 

Two  mechanisms  are  frequently  used  for  deriving  information  from  a flow 
graph:  one  consists  of  performing  graph  transformations , [6,7]  the  other  con- 
sists of  performing  a search  on  a graph  [8,9].  When  graph  transformations  are 
used  a flow  graph  G(N,E,n0)  is  transformed  into  another  flow  graph  G(N',E',np): 
the  transformation  is  not  only  one  of  structure  but  also  one  of  information 
attached  to  nodes  and  edges.  This  approach  generally  consists  of  a sequence  of 
transformations,  each  being  an  elementary  transformation  in  some  sense.  Through 
an  appropriately  chosen  sequence  of  transformations  global  information  about  the 
program  can  be  collected  and  distributed  to  appropriate  nodes.  A search  con- 
sists of  moving  over  the  graph  in  a systematic  manner  dictated  by  the  relation- 
ships between  nodes  implied  by  the  edges.  During  the  search  global  information 
about  relationships  can  be  collected.  Transformations  may  be  used  to  assist 
the  search,  and  transformations  may  be  used  in  place  of  a search. 
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Two  kinds  of  search  on  a flow  graph  are  common:  depth-first  and  breadth 
first.  A depth-first  search  Is  defined  by  the  following  algorithm: 

Algorithm  (OFS) 


1.  Push  the  entry  node  on  a stack  and  mark  it  (this  is  the  first  node 
visited,  nodes  are  marked  to  prevent  visiting  them  more  than  once). 

2.  While  the  stack  Is  not  empty  do 

2.1  If  there  Is  an  edge  from  the  node  at  the  top  of  the  stack  to  an 
unmarked  node  then  push  the  unmarked  node  on  the  stack  and  mark 
It  else  pop  the  stack. 

3.  Stop. 

A breadth-first  search  Is  defined  by  the  following  algorithm: 


Algorithm  (BFS) 


1.  Put  the  entry  node  on  a queue  and  mark  It. 

2.  While  the  queue  Is  not  empty  do 


" 


If  there  Is  an  edge  from  the  node  at  the  head  of  the  queue  to 
an  unmarked  node  then  add  the  unmarked  node  to  the  end  of  the 
queue  and  mark  It  else  remove  the  node  at  the  head  of  the  queue 


3.  Stop. 


A search  defines  a numbering  of  the  nodes  determined  by  the  order  in 
which  they  are  visited.  Two  numberings  of  importance  associated  with  DFS  are 
preorder  numbering  and  postorder  numbering:  preorder  numbering  corresponds 
to  the  order  in  which  the  nodes  are  first  visited  in  a depth-first  search  and 
postorder  numbering  corresponds  to  the  order  in  which  they  are  last  visited  in 
a depth-first  search.*  These  numberings  are  illustrated  in  Fig.  3.1.  When  a 
graph  is  a tree  preorder  numbering  assures  that  every  node  has  a higher  number 
than  its  parent  and  postorder  numbering  assures  that  every  node  has  a lower 
number  than  Its  parent.  This  is  illustrated  in  Fig.  3.2.  When  an  arbitrary 
graph  has  a preorder  numbering  the  presence  of  an  edge  (vj,vj)  such  that 
v^  > vj  implies  the  graph  Is  not  a tree  but  the  converse  is  not  true.  When  an 
arbitrary  graph  has  a postorder  numbering  the  presence  of  an  edge  ( v i » v j ) such 
that  v.j  < vj  implies  the  presence  of  a cycle;  removal  of  all  edqes  with' this 
property  transforms  the  graph  into  an  acyclic  directed  graph,  but  not  neces- 
sarily a tree.  It  is  important  to  recognize  that  a preorder  or  postorder 
numbering  can  be  done  quickly.  The  time  required  for  a depth-first  search 
increases  linearly  with  the  number  of  edges  in  a connected  graph:  each  edge 
is  traversed  once  in  a forward  direction  and  once  in  the  backward  direction, 
thus  the  time  complexity  for  the  DFS  algorithm  is  0(|E|). 


*There  has  been  some  confusion  in  the  literature  with  this  terminology.  Our 
definition  corresponds  to  recent  usage  [1]  but  differs  from  Knuth  [101. 
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It  Is  useful  for  a number  of  applications  to  know  whether  a pair  of  nodes 
can  be  on  a path.  This  gives  rise  to  the  following  problems:  given  a node,  u, 
determine  all  nodes  v such  that  there  Is  a path  from  u to  v (symbolically  this 

is  expressed  as  the  set  S(u)  3 {v|u  v > ) ; and  given  a node,  u,  determine  all 
nodes  v such  that  there  is  a path  from  v to  u (symbolically  this  is  expressed 

as  P(u)  * { v | v -*  ul).  Any  algorithm  for  solving  the  first  problem  can  be  used 
to  solve  the  second  problem  simply  by  reversing  the  directions  of  the  edges. 

The  reflexive  and  transitive  closure  of  a graph  is  defined  as  the  set  of  all 

ordered  pairs  of  nodes  (u,v)  such  that  there  is  a path  (including  a path  of 

* 

zero  length)  from  u to  v (symbolically  this  is  { ( u , v ) | u -»  vl).  If  paths  of 
zero  length  are  excluded  then  the  set  Is  called  the  transitive  closure 

(symbolically  this  Is  { ( u , v ) | u * vl). 

An  algorithm  by  Warshall  [11]  for  computing  transitive  closure  is  based 
on  matrix  multiplication  and  has  time  complexity  0(|Np).  It  is  not  difficult 
to  see  that  the  transitive  closure  could  be  computed  by  performing  |N|  depth- 
first  searches,  one  search  from  each  node.  This  approach  would  require  a time 
proportional  to  ( N ( ( E ( . Of  course  |E|  s lN|*  so  in  the  worst  case  the  time 
required  would  also  be  proportional  to  |N|3.  But  for  programs  it  is  more 
common  that  I E | s k|N|  where  k is  a constant  and  so  the  time  would  be  propor- 
tional to  | N 1 2 . 

A clever  Idea  described  by  Schnorr  [12]  may  allow  further  improvement  of 
the  computation  time  for  transitive  closure.  Consider  a node  v of  G(N,E,n0) 

and  suppose  v u^ , v -*•  U2*....v  -*  uk  where  k * f I N I + f| . Now  consider  a 

graph  G'(N,E')  derived  from  G by  simply  reversing  the  direction  of  all  of  the 

* * * 

edges.  Let  w be  a node  and  suppose  in  G:  w -*■  r^ , w -*•  r2,. . . ,w  -*•  rk  where  k 

Is  defined  as  before.  It  is  easy  to  see  that  the  sets  {u^ .u^.. . • .u^}  and 

(r1 ,r2,. . . .r^)  must  have  at  least  one  node  in  common  since  2 x f|N|/2  + 1"| 

equals  IN)  + 2 or  |N|  + 3 according  as  |N|  Is  even  or  odd.  This  idea  is  the 
basis  of  an  algorithm  which  uses  breadth-first  search  to  compute  transitive 
closure  but  terminates  the  search  from  a node  after  P|N|/2  + 1*1  nodes  have  been 
reached,  unless  It  terminates  earlier  because  no  more  nodes  can  be  reached. 

Then  the  Idea  just  described  Is  applied  to  complete  the  computation  of  the 
transitive  closure.  The  Interesting  property  of  this  algorithm  Is  that  Its 
expected  time  for  execution  Is  0(|N|  + |E|*)  where  |E|*  is  the  expected  number 
of  edges.  The  model  used  for  computing  the  expected  values  consists  of  an 
ensemble  of  random  graphs  with  |N|  nodes  and  for  any  pair  of  nodes  u and  v 
there  Is  constant  probability  p for  an  edge  (u,v).  Unfortunately,  this  ensemble 
probably  does  not  match  the  ensemble  for  real  programs  very  well. 

A determination  of  strongly  connected  components  is  useful  in  some  aspects 
of  program  analysis.  A strongly  connected  component,  SCC,  of  a flow  graph 
G(N,E,nQ)  Is  a subset  of  N defined  as  follows:  for  every  pair  of  nodes  u,  v 
In  the  SCC  there  Is  a path  from  u to  v and  v to  u In  G,  and  no  more  nodes  may 
be  added  to  SCC  preserving  this  property.  In  general  a graph  may  have  more 
than  one  SCC.  Once  the  SCC's  of  G have  been  determined  it  is  possible  to  make 
a transformation  G(N,E,n0)  G'(H',E',n0)  where  the  elements  of  N*  are  the 

SCC's  of  G and  the  elements  of  E'  represent  paths  between  components  implied 
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by  the  elements  of  E:  the  idea  is  illustrated  In  Fig.  2.5.  The  flow  graph  G' 
is  acyclic  and  so  a partial  ordering  of  the  nodes  Is  possible.  In  particular, 

4t 

If  the  nodes  of  G‘  are  numbered  In  postorder  then  on  every  path,  n<j  -*■  n,  n e N\ 
the  numbering  of  the  nodes  will  be  In  decreasing  order.  This  partial  ordering 
Is  useful  In  data  flow  analysis.  A fast  algorithm  for  computing  SCO's  has  been 
described  by  Tarjan  [13].  In  this  algorithm  a pair  of  numbers,  preorder(n)  and 
lowlink(n),  is  associated  with  each  node  n;  preorder(n)  is  the  preorder  number 
defined  earlier, lowlink(n)  is  defined  by 

lowlink(n)  = min[preorder(v)] , S(n)  = { v I n -*•  v). 

S(n) 

This  numbering  is  illustrated  in  Fig.  3.3.  Tarjan  shows  that  a pair  of  nodes 
are  in  the  same  SCC  if  and  only  if  they  have  the  same  lowlink  number  and  that 
the  lowlink  numbers  can  be  computed  using  a depth-first  search.  The  time 
complexity  for  Tarjan's  algorithm  applied  to  a flow  graph  is  o(|N|+|E|). 

In  testing  a program  the  notion  of  coverage  or  completeness  of  a set  of 
tests  is  important  [14,15].  One  measure  of  test  coverage  is  the  percentage  of 
nodes  executed  or  edges  traversed  in  the  flow  graph.  Randomly  selected  input 
data  for  a set  of  tests  will  give  poor  coverage,  therefore  some  care  in 
choosing  test  data  is  necessary.  In  attempting  to  choose  the  test  data  care- 
fully the  question  of  whether  it  is  possible  to  execute  a given  set  of  nodes 
in  a test  arises.  This  question  is  closely  related  to  the  following  one  about 

a flow  graph  of  the  program:  given  the  flow  graph  G(N,E,n  ) and  a set  N', 

* ® 

N'  c N,  is  there  a path  nQ  -*■  n,  neN,  which  includes  every  node  in  N‘.  Note 

that  if  the  answer  to  the  graph  question  is  no,  then  the  answer  to  the  ques- 
tion about  the  program  is  certainly  no;  however,  if  the  answer  to  the  graph 
question  is  yes  one  cannot  infer  that  the  answer  to  the  question  about  the 
program  is  yes  because  it  may  not  be  possible  to  satisfy  all  of  the  branch 
conditions  as  illustrated  in  Fig.  3.4.  Gabow,  Maheshwari , and  Osterweil  [5] 
have  described  an  algorithm  for  answering  the  flow  graph  question.  The  idea 
is  based  on  a consideration  of  an  acyclic  directed  graph  which  would  result 
from  an  arbitrary  flow  graph  if  one  were  to  replace  all  SCO's  by  single  nodes. 
Note  that  if  any  node  v in  N ' , the  set  of  nodes  to  be  included  in  the  path,  is 
in  a SCC  then  a path  to  any  member  of  that  SCC  can  be  extended  to  Include  v. 

Thus  the  original  graph  question  only  needs  to  be  asked  about  an  acyclic 
directed  graph.  In  presenting  the  algorithm  we  use  the  word  "frontier"  for 
the  set  of  entry  nodes  (initially  the  frontier  is  {n0> ) removing  a node  from 
the  graph  Implies  all  edges  leaving  the  node  are  also  removed.  Here  is  the 
algorithm: 

until  the  frontier  is  empty  or  the  frontier  contains 
more  than  one  node  in  N’  do 
if  the  frontier  is  a singleton  then  remove  this  node 
else  remove  a frontier  node  that  is  not  in  N'; 
stop. 

If  the  graph  is  empty  when  this  algorithm  terminates  then  there  is  a path 
* 

n0  -*■  n which  Includes  all  nodes  of  N ' , and  if  the  graph  is  not  empty  then  there 
is  no  such  path.  This  algorithm  is  Illustrated  in  Fig.  3.5.  The  time  com- 
plexity for  this  algorithm  is  C>( | E | ) and  since  the  time  complexity  for  getting 
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the  SCC's  by  Tarjan's  algorithm  Is  also  0(|E|)  it  follows  that  the  overall 
time  complexity  for  answering  the  original  graph  question  is  0(|E|). 


Data  flow  analysis  has  applications  in  global  program  optimization  [16,17], 
error  detection  [9],  discovery  of  unexecutable  paths  [18,19],  and  detection  of 
deadlock  [20].  A basic  problem  in  data  flow  analysis  is  the  so-called  live 
variable  problem.  This  problem  can  be  stated  as  follows:  given  a flow  graph 
GfN.E.n,,)  with  the  nodes  labeled  to  show  the  reference  (r)  actions  and  defini- 
tion (d)  actions  on  a variable  x,  and  given  n,  neN,  does  there  exist  a path 
from  n such  that  the  first  action  on  x is  r.  If  the  answer  to  this  question 
is  yes,  then  x is  said  to  be  live  at  n,  otherwise  it  is  dead  at  n.  If  only  one 
variable  and  one  node  were  Involved  the  easiest  way  to  answer  this  question 
would  be  to  conduct  a depth-first  search  from  n.  However,  the  normal  situation 
is  that  this  question  is  to  be  answered  for  many  variables  at  all  nodes  of  the 

flow  graph.  A number  of  algorithms  for  treating  this  problem  have  been 

described  In  the  literature  [6,7,8,16,21].  A particularly  simple  and  effective 
algorithm  is  the  one  due  to  Hecht  and  Ullman  [8],  The  main  Ideas  are  as 
follows.  We  associate  three  sets  with  every  node:  at  node  n these  sets  are 
ref(n),  def(n),  live(n).  The  sets  are  initialized  as  follows  for  all  n,  ncN: 

1.  ref(n)  = { V | V a variable  referenced  at  n} 

2.  def(n)  = { V | V a variable  defined  at  n} 

(for  simplicity  we  assume  here  that  a variable  is 
not  referenced  and  defined  at  the  same  node.) 

3.  11ve(n)  = 0 (the  empty  set) 


After  this  Initialization  ref(n)  and  def(n)  are  not  changed,  but  11ve(n)  Is 
modified  by  applying  the  following  formula  iteratively  to  the  nodes  of 
G(N,E,n0): 

live(n) 


u 

1 ive(k)  n "i  def(k) 

' 

u ref(k) 

keS(n)  . 

where  S(n)  =*  {j|(n,j)tE},  the  set  of  successors  of  n.  Two  examples  of  the 
application  of  this  formula  are  shown  in  Fig.  3.6.  Hecht  and  Ullman  have 
shown  [8]  that  If  this  formula  is  applied  Iteratively  to  the  nodes  of  G(N,E,n0) 
In  postorder,  the  convergence  Is  quite  rapid.  In  practice  the  number  of 
iterations  can  be  expected  to  be  less  than  four  or  five.  The  application  of 
this  algorithm  to  the  detection  of  uninitialized  variables  In  entire  programs 
has  been  thoroughly  discussed  by  Fosdlck  and  Osterweil  [9]. 

There  Is  an  Interesting  connection  between  an  Important  problem  In  program 
analysis  and  non-linear  optimization.  I mentioned  earlier  that  a path  In  a 
program  flow  graph  may  not  represent  a sequence  of  statements  that  could 

actually  be  executed:  Fig.  3.4  Illustrates  this  situation.  Suppose  that  we 

* 

are  given  a path  n0  -►  n In  a flow  graph  and  we  wish  to  determine  whether  the 
path  can  be  executed.  To  do  this  Imagine  moving  along  the  path  writing  down 
the  predicates  that  must  be  satisfied  at  every  branch.  In  doing  this  we  must 
take  Into  account  changes  In  values  of  variables  caused  by  assignments  so  that 
a particular  variable  name  represents  the  same  value  In  every  predicate.  A 
necessary  and  sufficient  condition  for  the  path  to  be  executable  Is  that  the 


189 


system  of  predicates  written  down  Is  consistent;  that  is,  there  must  exist  an 
assignment  of  values  to  variables  appearing  therein  such  that  every  predicate 
is  satisfied.  Now  this  issue  of  consistency  is  exactly  the  same  one  that 
arises  in  trying  to  determine  whether  there  is  a feasible  region  defined  by 
the  constraints  in  a non-linear  (or  linear)  optimization  problem.  This  problem 
is  difficult  and  there  are  no  really  good  algorithms  for  dealing  with  it. 

Clarke  has  discussed  the  problem  and  some  experience  in  attempting  to  solve  it 
in  a recent  paper  [22]. 

4.  CONCLUSION.  Recent  work  has  provided  a number  of  algorithms  which 
have  important  applications  in  the  analysis  of  computer  programs.  While  much 
work  remains  to  be  done  in  the  development  of  these  algorithms,  we  are  now  in 
a position  to  build  some  important  program  analysis  tools  based  on  these  algo- 
rithms. Such  tools  could  be  used  for  program  optimization,  error  detection, 
testing,  and  documentation. 

These  tools  should  be  organized  into  a library  that  is  easy  to  use.  This 
will  take  careful  planning.  Consistent  patterns  of  use  must  be  developed,  and 
portability,  and  adaptability  to  various  language  dialects  must  be  taken  into 
account.  It  is  a large  and  difficult  challenge,  but  one  which  we  should 
accept. 
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Both  of  those  graphs  are  ac 
The  graph  on  the  right  is  n 
entering  it. 


'ut  only  the  graph  on  the  le 
•ee  because  it  has  a node  wi 


r 


DO  20  K 

20  CONTINUE 


Figure  2.3 

The  mapping  of  statements  onto  the  nodes  of  a flow  graph  is  not  always 
1-1.  The  mapping  of  the  DO  statement  in  FORTRAN  is  an  example  of  such 
an  exception,  as  shown  here. 
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IF  ( - - ) GO  TO  90 


90  INDEX  = J/K 

IR  ■ J - INDEX  * K 
IF  ( IR.EQ.O)  GO  TO  95 
ISHFT  = K - IR 
INDEX  = INDEX  + 1 
95  L = M (INDEX) 


Figure  2.  *4 

(a)  Segment  of  a FORTRAN  program;  (b)  Segment  of 
a flow  graph  derived  from  the  program  segment, 
with  nodes  representing  individual  statements; 

(c)  Segment  of  a flow  graph,  derived  from  the 
grap.'i  in  (t) , in  wtici.  nodes  represent  blocks. 


(a) 


TRUE 


INDEX 

I R <-  • • * TRUE 

l R - 0 
FA  L S E 

ISHFT  -*a-  • • • 

INDEX 

L •c—  • • • 


INDEX  • • • 
IR  <-  • • • 

IR  = 0 

false 

ISHFT  -c-  • • • 
INDEX  -e—  • • • 


(h) 


(c) 
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(MAIN  PROGRAM) 


CALL  SUB  A ( ) 


CALL  SUB  B ( ) 


CALL  SUB  A ( ) 


END 

SUBROUTINE  SUB  A (-  - -) 


CALL  SUB  B ( ) 


CAI  L SUB  C ( ) 


END 

SUBROUl 1NE  SUB  B (-  - -) 


END 

SUBROUTINE  SUB  C ( ) 


1 


(to 


END 

(a) 

Figure  2.6 

(a)  FORTRAN  program  with  subroutine  calls;  (b)  Call  graph  for  program  in 
(a). 
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(b)  (c) 


Figure  2.7 


(a)  Flow  graph  with  nodes  numbered  for  identification  and  relevant  state- 
ments corresponding  to  nodes  indicated;  (b)  Flow  graph,  derived  from  (a), 
with  data  actions  on  X Indicated;  (c)  Flow  graph  derived  from  (a)  with 
data  actions  on  Y indicated. 


Figure  j. l 

(a)  Flow  graph  with  nodes  numbered  in  proorder;  (b)  Mow  graph  with  i , ,i.-> 
numbered  in  postorder. 
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w 


Figure  3.2 


(a)  Tree  with  nodes  numbered  in  preorder,  each  node  has  a higher  number 
than  its  parent;  (b)  Tree  with  .iodes  numbered  in  postorder,  each  node  has 
a lower  number  than  its  parent. 
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vi gure 


3.3 


Flow  graph  with  nodes  labeled  (1,j)  where  i is  the  preorder  number  and  j 
is  the  low! ink  number.  All  nodes  with  the  same  lowlink  number  are  in  the 
same  strongly  connected  component  and  nodes  with  different  lowlink  numbers 
are  in  different  strongly  connected  components. 
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e or 


X false 


Flew  graph  with 
ed.  The  path  a 
a ir.pl  ies  y > 0 


Figure  3-1* 


nodes  labeled,  and  associated  program  statements  indie 
* b + c + d + e is  unexecutable  since  the  computa Lion 
and  traversing  the  edge  (d,e)  implies  y < 0. 


’llustration  of  steps  in  the  Maheshwarl,  Gabow,  Osterweil  algorithm  to  de- 
termine existence  of  a path  through  a set  of  nodes.  Nodes  to  be  on  the 
path  are  marked  by  p in  (a).  Nodes  in  the  same  SCC  are  collapsed  into  a 
single  node  resulting  in  (b).  The  frontier  node  is  removed  resulting  in 
(c).  The  frontier  node  in  (c)  is  removed  resulting  in  (d).  Now  there  are 
two  nodes  on  the  frontier  both  of  which  are  to  be  on  the  path  and  the  al- 
gorithm stops.  Since  the  graph  is  not  empty  at  this  point  the  conclusion 
is  there  is  no  path  in  the  graph  (a)  which  includes  all  nodes  marked  p. 


A 
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1 ive(n) 

(after  1 iter . ) 


Cl 


3 y 


x,y,z 


ivevn, 
final  _] 

x,y,z 

x,y,z 

x,y,z 


Figure  3.6 


(a)  Illustration  of  live  sets,  for  given  sets  ref(n)  and  def(n)  on  a 
graph.  Nodes  are  numbered  in  postorder  and  just  one  application  of  the 
live  formula  to  the  nodes  in  postorder  produces  the  final  live  sets. 

(b)  Another  illustration  of  live  sets.  In  this  case  two  applications 
of  the  live  formula  to  the  nodes  in  postorder  is  required  to  obtain  the 
final  live  sets. 
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Most  scientists  who  occasionally  have  to  solve  numerically  a differential 
equation  now  own  a hand  held  programmable  calculator  which  will  very  often  be 
adequate.  Since  hand  held  calculators  are  slow,  there  is  particular  need  to 
keep  the  number  of  function  evaluations  to  a minimum.  At  first  thought,  this 
would  seem  to  rule  out  use  of  Runge-Kutta  methods,  but  recent  developments, 
such  as  those  by  Fehlberg  (mostly  unknown  except  to  specialists)  , may  make  them 
competitive  after  all.  In  the  area  of  predictor-corrector  methods,  some  varia- 
tions make  excessive  use  of  memory  locations  for  a hand  held  calculator.  An 
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Significance  and  Explanation 


Most  scientists  who  occasionally  have  to  solve  numerically  a differential 
equation  now  own  a hand  held  programmable  calculator  which  will  very  often  be 
adequate  for  the  purpose.  Since  hand  held  calculators  are  slow,  there  is 
particular  need  to  keep  the  number  of  calculations  of  the  function  appearing 
in  the  differential  equation  to  a minimum.  At  first  thought,  this  would  seem 
to  rule  out  use  of  what  are  called  Runge-Kutta  methods,  but  some  recent  develop- 
ments may  make  them  competitive  after  all  for  certain  types  of  problems.  These 
types  are  identified,  and  a discussion,  with  numerical  examples,  is  given  how 
best  to  use  the  Runge-Kutta  methods. 

Some  other  methods  which  call  for  much  fewer  calculations  of  the  function 
require  more  memory  locations  than  are  available  on  many  hand  held  calculators. 

There  still  remain  some  methods  which  are  modest  both  in  the  number  of  calcula- 
tions and  in  memory  requirements.  How  best  to  use  these  on  a hand  held 
calculator  is  explained,  with  numerical  examples. 

I 


208 


SOLVING  DIFFERENTIAL  EQUATIONS 
ON  A HAND  HELD  PROGRAMMABLE  CALCULATOR 

J.  Barkley  Rosser 

Dedicated  to  Prof.  Dr.  Johannes  Weissinger  on  his  65th  birthday. 


1.  Prel iminaries . The  present  discussion  is  limited  to  initial  value  ordinary 
differential  equations.  One  wishes  to  solve  a system  of  equations 


(1.1) 


Yj  = f, (x,y. ,y  , ....  y ) 

I 1 I z p 

yl  = f_  (x,y  ,y  , . . . , y ) 
z Z 1 Z p 


y‘  = 'Vo*  •••  < y 

p p 1 2 p 

being  given  the  values  of  y^.y^,  • ••*  y at  x = xo  ' here  y^  denotes 
dy^/dx.  We  will  discuss  only  the  special  case 


(1.2)  y’  = f (x,y ) , 

being  given  the  value  of  y at  x = x^;  here  y*  denotes  dy/dx.  The  dis- 
cussion of  (1.2)  can  easily  be  generalized  to  a system  of  equations.  See  Conte 
and  dc  Boor,  1972,  pp.  365-366.  Incidentally,  sets  of  higher  order  equations 
can  be  reduced  to  a system  of  first  order  equations,  such  as  (1.1).  See 
Conte  and  de  Boor,  1972,  p.  365. 

Hie  person  who  only  occasionally  has  to  solve  numerically  a differential 
equation  may  never  have  had  a course  in  numerical  analysis,  or  may  have  forgotten 
much  of  it.  So  it  seems  necessary  to  make  the  present  paper  reasonably  self- 
contained,  presupposing  little  previous  experience.  Even  if  the  reader  has  pre- 
vious experience,  it  was  presumably  on  a large  fast  computer.  For  the  hand  held 
calculator,  considerations  are  sufficiently  different  that  one  cannot  rely  too 
much  on  such  previous  experience.  The  discussion  to  follow  is  comprehensive 
enough  to  cover  the  important  differences. 

Also,  there  is  disagreement  between  reputable  texts  on  various  points.  .An 
adjudication  between  them,  with  explanations,  is  included,  in  spite  of  the  fact 
that  this  adds  to  the  bulk  of  the  present  paper. 
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The  overall  procedure  for  solution  is  the  classic  one,  namely  to  try  step 

by  step  to  calculate  approximations  y^y^.y^  ...  for  y corresponding  to 

the  values  x ,x  ,x  , ...  , where  xn  < x,  < x_  < x,  < . . . . We  take  y„  as 

123  0123  -'0 

the  value  given  for  y at  x = xQ. 

We  denote  x - x generically  by  h . Usually  h will  be  taken  the 
n+i  n 

same  for  a considerable  succession  of  steps.  A change  in  h is  occasionally 
called  for,  but  not  uncommonly  this  involves  some  travail,  and  in  the  main  one 
tries  to  avoid  it.  On  the  other  hand,  all  authorities  agree  that  at  each  step 
(or  at  least  frequently),  tests  should  be  made  to  see  if  a change  of  step  length 
is  needed,  shorter  to  keep  errors  under  control,  or  longer  to  avoid  unduly  ex- 
tended calculation  that  would  result  from  taking  h smaller  than  necessary. 


2.  The  Euler  method.  If  one  has  proceeded  to  a qood  approximation  y for 
n 

the  value  of  y at  x = x^  , a Taylor's  expansion  will  give 

u.i) 

where  x < £ < x ...  We  evaluate  y'  by  (1.2),  and  get 
n n+l  n 2 

(2.2)  yn+1  = yn  + h f(xn,yn>  + ^-  y"(0  . 

Unless  one  already  knows  the  solution  of  (1.2),  one  has  no  way  to  calculate 
the  final  term  on  the  right  of  (2.2).  Certainly,  if  h is  taken  to  be  small 
enough,  the  final  term  will  be  quite  small,  and  the  approximation 

(2.3)  y £ y + h f(x  ,y  ) 

n+l  n n n 

i 

Will  give  a sufficiently  accurate  value  for  Vn+^»  after  which  one  gets  y ^ 
by  a similar  formula,  then  yn+3»  etc.  A problem  with  which  we  must  cope  is 
being  sure  that  we  have  taken  h small  enough  that  repeated  use  of  (2.3)  in- 
stead of  (2.2)  does  not  lead  to  serious  error  in  the  end. 

Suppose  we  wish  to  solve 


y<  = -y, 
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given  that  y = 1 when  x = 0 . Let  us  try  to  approximate  the  value  of 

“X 

y when  x = 6.  As  the  answer  for  (2.4)  is  obviously  y = e , we  can  say 

that  for  0 < £ _<  6 we  have  |y"(£)|  < 1.  Hence  the  error  in  (2.3)  will  be 
2 

less  than  h /2.  If  we  take  steps  of  constant  length  h , we  will  require  6/h 

2 

steps  to  get  from  x = 0 to  x = 6.  With  an  error  less  than  h /2  at  each 
step,  and  6/h  steps,  the  final  error  will  add  up  to  less  than 


From  this,  it  is  tempting  to  conclude  that  the  method  is  first  order;  that  is, 
the  overall  error  is  roughly  proportional  to  h . For  example,  if  we  decrease 
h by  a factor  of  2 , we  would  expect  to  decrease  the  error  at  x = 6 by 
approximately  a factor  of  2. 

Within  bounds,  the  conclusion  is  correct.  However,  the  argument  given 
above  to  support  it  is  fallacious.  To  see  this,  let  us  look  at  a specific 
example.  Take  h = 0.1.  By  (2.3),  we  will  get 

y1  - o.9. 

This  is  too  small  by  about 

0.0048374. 

2 

(In  accordance  with  (2.2),  this  error  is  close  to  h /2.)  However,  the  value 
of  y at  x = 6 is 
(2.5)  e"6  a 0.0024888. 

Thus  our  final  answer  is  less  than  the  error  on  the  first  step.  To  suggest 
that  we  can  approximate  the  overall  error  at  x ■ 6 by  adding  up  such  errors  as 
shown  above  for  each  of  the  60  steps  required  to  get  from  0 to  6 is  simply 
not  sound. 

In  fact,  for  the  equation  (2.4),  use  of  (2.3)  with  h ■ 0.1  gives 
yn+l  ' <0-9,lrn  ' 
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Using  this  60  times  would  give  an  estimate  for  y at  x *•  6 of 
(0.9) 60  S 0.0017970. 

This  is  in  error  by  less  than  28%.  For  a procedure  in  which  the  error  on  the 
first  step  was  almost  twice  the  total  final  answer,  this  is  not  bad. 

By  (2.3),  we  get 

(2.6)  yn+1  = (l-h)yn 

for  the  equation  (2.4).  This  gives 
. -h  h2  h3  , 

yn+i = {e  - rr  + if  - •••  }yn 

. -h  h2  „ h h2  , . 

" {G  -T(1-I+iT'  •••Hyn 

-hr,  h2  „h  h h2  , , 

= e (1  - — e (1  --+—-...)  }y  . 

2 3 12  n 


The  factor 


(2.7)  eh(l  - | + - ...) 

tends  to  remain  constant  and  close  to  unity  for  |h | small,  since  as  h 
increases  the  first  factor  increases  and  the  second  decreases.  So  we  replace 

(2.7)  by  unity,  getting 

B.8)  y„„  * «'hd-  T>yn  ■ 


So  at  x ■ 6,  we  get 


•-[(  ■ - ‘t)'; 

Li 


- — “I  -3h 

h2  I 


e-6  e"3h 


For  h ■ 0.1,  we  get 
y - «~6 (0.74082). 

Hence,  by  the  above  analysis,  we  expect  the  calculated  value  to  be  too  low 
by  about  26%:  it  was  actually  too  low  by  about  28%. 


L 


(2.9) 


Hie  final  formula, 

, -6  -3h 

y ■ e e 

shows  that  (for  a reasonable  range  of  h ) the  relative  error  is  indeed  of  order 
h . Actually  for  h = 0.1  and  the  equation  (2.4),  we  see  by  (2.2)  that  we  have 
about  0.5%  relative  error  at  each  step.  Accumulating  such  relative  errors  for 
60  steps  could  give  an  overall  error  of  30%,  which  is  about  what  we  got. 

So  sometimes  it  is  absolute  error  that  one  should  accumulate  from  step 
to  step,  but  other  times  it  can  be  relative  error.  For  programmers  who  try  to 
write  general  purpose  differential  equation  solvers  for  large  computers  with 
error  control  built  in,  the  question  of  how  to  handle  error  accumulation  poses 
formidable  difficulties.  See  Hull,  et  al.,  1972,  pp.  607-608.  When  one  is 
solving  on  a hand  held  calculator,  one  sees  the  progress  of  the  solution,  step 
by  step.  In  regions  where  it  is  best  to  reckon  by  accumulating  relative  errors, 
one  can  do  so.  However,  when  it  would  be  better  to  accumulate  absolute  errors 
(for  example,  if  one  is  going  to  pass  through  a zero  value  of  y),  the  change 
to  accumulating  absolute  errors  is  easily  made.  Such  flexibility  is  hard  to 
arrange  in  a preset  program  for  a large  computer. 

By  (2.9),  if  we  should  wish  to  get  to  x » 6 with  0.1%  accuracy,  we  should 
have  to  take  h about  1/3000.  Hius  we  would  require  18,000  steps.  Hiis  would 
not  be  too  bad  on  a large  fast  computer,  but  on  a hand  held  calculator  it  could 
require  hours,  especially  if  our  equation  to  be  solved  were  more  complicated 
than  (2.4).  So  we  need  something  better  them  the  Euler  method. 

Actually,  there  were  two  reasons  for  including  the  present  section  on  the 
Euler  method.  One  is  to  demonstrate  the  importance  of  maintaining  flexibility 
about  whether  one  is  accumulating  relative  errors  or  absolute  errors.  With  a 
hand  held  calculator,  such  flexibility  is  easily  maintained.  With  a preset 
program  for  a large  fast  computer,  such  flexibility  is  almost  impossible. 
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Thus,  in  Hull,  et  al.,  1972,  the  decision  was  made  to  lock  one’s  self  into 


accumulating  absolute  errors  (see  their  p.  607).  To  get  to  x “ 6 with  a 
0.1%  error  by  this  scheme  would  require  far  more  than  18,000  steps.  On  the  other 
hand,  the  large  computers  are  so  very  fast  that  it  is  still  a practical  scheme. 

The  other  reason  for  mentioning  the  Euler  method  is  that  it  is  useful  for 
"looking  ahead."  Using  a few  steps  with  large  h takes  very  little  time  and 
gives  at  least  a general  idea  of  what  one  might  expect  to  encounter  for  some 
distance  ahead.  If  one  takes  such  a "look  ahead"  every  so  often  it  can  help  in 
planning  how  best  to  arrange  the  upcoming  part  of  the  integration. 


3.  Runge-Kutta  methods.  The  text  Henrici,  1977,  explains  how  to  do  many  calcula- 
tions on  the  HP-25  programmable  calculator.  On  p.  182,  he  suggests  that  for 
solving  the  differential  equation  (1.2)  one  might  use  the  second  order  Runge- 
Kutta  method 


(3.1) 

Applied  to 

(3.2)  y'  ■ k y , 


yn+l  * + 2 (£(VV  * ,(Vl'yn  + h f ‘VV’1' 


this  gives 


= ( 1 + hk 


(hk) 


n+1  2 

An  analysis  like  that  in  the  previous  section  shows  that  if  we  set  y ■■  1 at 

x «*  0,  then  at  x = X we  would  get  approximately 

2,3 


(3.3)  y * «■”'  e 


a e^  e-h  * */6 

The  first  factor  on  the  right  is  what  y should  be,  and  the  second  factor 
shows  about  how  far  off  we  are  from  the  true  value.  So,  if  we  take  k ■ -1 
(as  in  (2.4))  and  X - 6,  and  ask  for  0.1%  accuracy,  we  need  to  take  h about 
1/32.  Thus  it  will  take  192  steps  to  get  to  x ■ 6.  However  (note  (3.1)), 
each  step  requires  TWO  evaluations  of  the  function  f(x,y).  So  we  require  384 
function  evaluations. 


214 


Actually,  wc  require  more  than  that.  As  stated  above,  one  should  make 


frequent  checks  to  see  if  h is  about  the  right  size;  this  is  a point  that  wo 

failed  to  come  to  grips  with  in  the  previous  section.  Analogous  to  (2.2),  there 

is  a formula  (see  pp.  192-200  of  Ralston,  1965)  that  gives  the  error  of  (3.1)  as 

3 4 

h K + h L + . . . , 

where  K,  L,  etc.,  are  conplicated  functions  of  derivatives  of  y and  partial 

derivatives  of  f(x,y).  For  small  h , we  nuy  say  that  the  error  of  (3.1)  is 

about  h^K.  If  we  take  two  successive  steps,  from  y to  y , and  if  K 

n n+2 

docs  not  change  much  from  one  step  to  the  next,  then  we  could  accumulate  a local 

error  of  2h^K  (over  and  above  whatever  error  we  had  at  y ) in  getting  to 

n 

y Now  apply  (3.1)  with  2h  in  place  of  h to  got  from  y to  y , in 

n+2  1 1 n Jn+2 

a single  step.  Assuming  that  K is  not  flucuating  badly,  we  would  make  an 

error  of  about  (2h)^K  in  getting  to  y . , or  four  times  the  error  we  made  in 

n+2 

getting  to  Yn+2  ^wo  s*ops.  So  now  we  have  two  approximations  for  y + , , 

one  with  an  error  about  four  times  the  other.  From  this,  we  can  estimate  about 

how  far  off  our  better  approximation  is.  If  it  is  close  enough,  we  take  it  as 

the  value  of  y If  it  is  not  close  enough,  we  have  to  go  back  to  y and 

n+2  n 

try  over  again  with  a smaller  value  of  h . Sec  Hull,  et  a 1 . , 1972 , bottom  of 

p.  616.  Hew  close  is  "close  enough"  is  a t u ky  u«  -tion  tor  which  Kioto  seems 

not  to  be  a very  good  answer.  Aftei  all,  tin-  in  v.  only  tw.  steps,  and  one 

has  to  worry  about  the  accumul.it  ion  of  error-  ov<  • m.«i  . to|  And,  as  noted  in 

the  previous  section,  there  is  the  question  it  one  should  tv  woi ■ i d about 

accumulation  of  absolute  errors  or  of  relative  errois.  Hut  at  least  one  has 

to  have  some  sort  of  estimate  of  the  step  by  step  errors. 

As  pointed  out  above,  if  we  take  two  steps  of  length  h each  to  get  from 

y^  to  yn+?,  we  probably  have  an  estimate  that  is  in  error  by  about  one  fourth 

of  what  we  would  get  if  we  went  from  y to  y , in  one  step  of  length  2h. 

*n  n+2  * 

If  the  error  were  F.XACTl.Y  one  fourth,  we  could  write  a formula  for  the  true 

215 


value  of  y . It  is  tempting  to  take  this  formula  as  the  value  for  Yn+2  • 

It  most  likely  gives  a better  estimate  for  y^+2.  However,  there  is  no  way  to 
say  how  much  better;  an  estimate  for  the  step  by  step  error  would  then  not  be 
available.  Also,  one  is  left  with  no  improved  value  for  y^+^ . So  we  had  best 
consider  that  this  formula  merely  provides  an  estimate  of  the  error  after  each 
pair  of  steps. 

For  the  pair  of  steps  of  length  h , we  required  two  function  evaluations 

per  step.  For  the  step  of  length  2h  , we  also  require  two  evaluations,  except 

that  the  evaluation  of  f(x,y)  at  x = x has  already  been  made.  So,  for  the 

n 

evaluation  of  y , and  y _ , with  an  estimate  of  error  at  x „ , we  require 
n+i  n+* 

altogether  five  function  evaluations.  So  for  the  192  steps  to  get  from  x «=  0 
to  x = 6,  that  is,  96  pairs  of  steps,  we  require  480  function  evaluations. 

While  this  is  a great  improvement  over  the  18,000  function  evaluations  required 
by  the  Euler  method,  it  could  be  rather  time  consuming  if  f(x,y)  is  at  all 
complex. 

So  we  wish  for  something  better.  We  cannot  manage  anything  better  on  the 
HP-25.  It  has  only  50  program  steps,  and  implementation  of  (3.1)  uses  39  of 
these,  leaving  only  11  program  steps  for  the  calculation  of  f(x,y).  In  many 
cases,  11  program  steps  will  not  be  adequate.  So,  if  we  are  to  do  much  with 
differential  equations,  we  need  a more  capable  calculator  than  the  HP-25. 

4.  Alternatives  to  Runge-Kutta . Let  us  suppose  that  we  have  a calculator  at 
least  as  capable  as  the  HP-65.  Most  programmable  calculators  now  on  the  market 
are  appreciably  more  capable  than  the  HP-65,  but  it  sufficed  for  the  calculations 
recorded  in  this  paper.  With  it,  one  can  carry  out  higher  order  Runge-Kutta 
methods  than  (3.1).  Suppose  we  use  the  classical  fourth  order  Runge-Kutta. 

See  (5.6-48)  on  p.  200  of  Ralston,  1965,  or  (6.37)  on  p.  338  of  Conte  and  de  Door, 

1972.  Incidentally,  if  we  have  two  equations 
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V 

n+1  yn 

(kl 

+ 2k2 

+ 2k3 

+ V 

, “ z 
n+1  n 

(*1 

+ 2*2 

+ 2£3 

+ V 

(4.1)  y ' = f (x,y ,z ) 

(4.2)  x*  = g(x,y ,z) , 

then  the  classical  fourth  order  Kunge-Kutta  method  consists  of 

(4.3) 

(4.4) 
where 

h • h f‘vW 
*i  ■ h 9(vvV 
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See  Conte  and  de  Soor,  1972,  pp.  365-366.  Extension  to  a system  of  equations 

such  as  (1.1)  is  obvious.  Incidentally,  the  error  in  (4.3)  is  approximately 

h5K  for  a suitable  K , and  the  error  in  (4.4)  correspondingly . 

If  we  wish  to  solve  (2.4)  out  to  x = 6 with  0.1%  accuracy,  we  must  take 

h slightly  less  than  3/8.  Taking  h = 3/8  is  probably  close  enough,  which 

would  require  16  steps.  There  are  four  function  evaluations  per  step.  IV) 

accomplish  the  Procedure  suggested  above  for  estimating  the  step  by  step  error 

of  y after  a pair  of  steps,  we  must  also  make  a single  step  of  length  2h 

n+2 

from  y^  to  Vn+2 ‘ a*so  takes  four  evaluations,  but  one  has  been  done 

before.  So  we  require  11  function  evaluations  for  each  pair  of  steps.  So  we 


need  88  function  evaluations 
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Can  we  do  better?  In  Biright  and  Hull,  1976,  testing  is  reported  of  four 
sorts  of  methods : 

1.  Runge-Kutta-Fehlberg  methods. 

2.  Rational  extrapolation  methods. 

3.  Adams  type  predictor-corrector  methods. 

4.  Milne  type  predictor-corrector  methods. 

The  Runge-Kutta-Fehlberg  methods  are  Runge-Kutta  typo  methods  improved 
according  to  ideas  in  Fehlbcrg,  1968,  1969,  and  1970,  to  give  easier  ways  of 
estimating  step  by  step  errors.  This  is  purported  to  reduce  somewhat  the 
number  of  function  evaluations.  In  view  of  certain  advantages  of  Runge-Kutta 
methods,  the  Fehlberg  improvements  make  the  Runge-Kutta  methods  fairly  competi- 
tive when  the  function  evaluations  can  be  done  quickly.  However  on  p.  949  of 
Enright  and  Hull , 1976,  some  disadvantages  with  the  Fehlberg  methods  are  reported. 
By  phone  (March,  1978)  T.  E.  Hull  of  Toronto  informed  the  present  writer  that 
new  improvements  had  eliminated  certain  of  these  disadvantages,  and  that  IMSL 
has  recently  embodied  these  new  improvements  in  its  Runge-Kutta  software 
package.  However,  at  best  these  methods  are  competitive  only  when  the  function 
evaluations  can  be  done  quite  quickly. 

The  rational  extrapolation  methods  derive  from  ideas  of  Gragg,  1965,  and 
were  developed  carefully  in  Bulirsch  and  Stoor,  1966.  In  Biright  and  Hull,  1976, 
these  methods  are  given  a fairly  good  rating  when  the  function  evaluations  can 
be  done  quickly.  They  seem  complex  to  program  for  a hand  held  calculator,  and 
so  we  will  say  no  more  about  them. 

Hi  right  and  Hull , 1976,  give  their  highest  ratings  to  certain  Adams  type 
predictor-corrector  methods.  However,  their  conclusions  do  not  necessarily  hold 
for  hand  held  calculators  because  of  the  very  limited  number  of  memory  locations 
of  hand  held  calculators.  So  we  will  make  a study  of  Adams  type  and  Milne  type 
predictor-corrector  methods  for  hand  held  calculators. 
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Incidentally,  one  cannot  merely  take  one  of  the  programs  tested  in  Enright 


and  Hull,  1976,  and  put  it  on  his  hand  held  calculator.  In  order  to  accomodate 
the  wide  diversity  of  vagaries  that  can  arise  in  solutions  of  differential 
equations,  these  programs  have  a large  amount  of  "overhead",  and  are  beyond  the 
capabilities  of  present  hand  held  calculators,  besides  being  so  long  as  to  be 
very  time  consuming  at  the  slow  speed  of  hand  held  calculators.  However,  unless 
one  has  a very  large  system  of  equations  to  solve,  one  can  proceed  step  by  step 
on  a hand  held  calculator.  One  "looks  ahead"  periodically  as  an  aid  to  planning 
the  calculation,  one  monitors  the  accumulation  of  errors  as  one  goes,  and  one  is 
alert  for  idiosyncracies  that  might  arise.  If  the  latter  do  arise,  one  can  try 
shifting  methods;  if  worse  comes  to  worst,  one  can  try  a change  of  variables, 
or  more  complex  stratagems. 

The  classical  predictor-corrector  method  is  that  of  Milne  (see  (28.1)  and 
(28.2)  on  p.  65  of  Milne,  1953,  or  (5.5-12)  on  p.  182  of  Ralston,  1965): 

<4'5’  Cl  ■ '„-3  ♦ T i2K  - K-l  * <-2> 

<4'6>  yn.l  ■ Vl  * I lfl,Ml''»!>  * *K  * K-lU 

This  uses  two  function  evaluations  per  step.  There  is  the  obvious  one  in 
(4.6),  and  when  one  comes  to  use  (4.5)  for  the  next  step,  one  needs  , which 

calls  for  a second  function  evaluation.  Incidentally,  it  is  required  that 


x - x 

n+l-i  n-i 


for  i = 0,1, 2, 3. 


To  use  this  method,  we  must  have  y ,,  y , , y*  y'  ,,  and  y'  stored, 

n-J  n-X  n-2  n-x  n 

and  for  the  next  step  we  will  have  to  have  y . and  y . So  our  storage 

n-2  n 

requirements  amount  to  four  values  of  y and  three  of  y*.  In  addition,  one 

has  to  carry  the  current  value  of  x . This  uses  up  eight  memory  locations, 

n 

which  for  all  practical  purposes  are  all  there  are  on  the  HP-65.  This  leaves 
no  memory  locations  to  use  in  evaluating  f(x,y).  So  use  of  the  classical 


Milne  predictor-corrector  would  often  not  be  possible  on  the  HP-65.  For 
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calculators  with  more  memory  locations,  the  large  requirement  for  memory 
locations  could  preclude  use  of  the  Milne  predictor-corrector  if  one  wishes  to 
solve  a system  of  even  as  few  as  three  equations.  A more  compelling  reason  not 
to  use  the  Milne  method  is  that  it  is  now  known  to  be  unstable  in  certain 
circumstances  which  arise  not  too  inf requently . We  shall  produce  predictor- 
corrector  methods  that  have  much  more  modest  memory  requirements  and  are  stable 
besides.  Meanwhile  the  Milne  predictor-corrector  will  be  used  to  illustrate 
typical  features  of  predictor-corrector  methods. 

As  the  name  would  indicate,  a predictor-corrector  method  embodies  a 
predictor  and  a corrector.  The  predictor  is  (4.5)  and  the  corrector  is 

(4-7)  v!  - vi + 1 (y;+i + k + vi1- 

which  is  nothing  more  than  Simpson's  rule  for  integrating  y*  approximately. 

The  obvious  disadvantage  of  (4.7)  is  that  one  needs  the  value  of  to  get 

(c ) (c ) 

the  value  of  y , , whereas  by  (1.2)  one  needs  the  value  of  y , to  get  the 

n+1  n+1 

corresponding  value  of  y'  . Actually,  if  h is  reasonably  small  and  f(x,y) 

n+1 

is  reasonably  well  behaved,  one  can  get  out  of  this  impasse  by  an  iteration 

scheme;  after  making  a guess  for  y'  , , successively  substitute  the  current 

n+l 

(c)  (c) 

y'  , into  (4.7)  to  get  a y , and  substitute  y , into  (1.2)  to  get  a better 
n+1  n+1  n+1 

value  for  y'  , . In  the  usual  case  that  will  arise,  this  will  converge  to  a 
n+1 

(c) 

y^+^  and  yn+^  that  satisfy  both  (4.7)  and  (1.2).  Unfortunately,  it  is  prodigal 
with  function  evaluations. 

So  we  adopt  a compromise.  A predictor  is  given,  in  this  case  (4.5),  which 

produces  a reasonably  close  approximation  for  yn+^.  Ibis  is  substituted  into 

(1.2)  to  get  a guess  for  y'  , . This  is  then  substituted  into  the  corrector; 

n+l 

the  net  result  is  embodied  in  (4.6).  There  the  process  is  stopped.  We  do  not 
have  as  much  accuracy  as  we  could  get  with  a few  more  iterations,  but  we  have 


held  the  total  nunbar  of  function  evaluations  to  two  per  step. 
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Since  we  need  four  values  of  y to  use  the  predictor  (and  at  equally  spaced 
values  of  x),  these  four  values  must  somehow  be  obtained  before  we  can  start  to 
use  this  process.  Ort  pp.  61-64  of  Milne,  1953,  is  given  a scheme  to  get  four 
starting  values.  It  is  now  generally  agreed  that  the  best  way  to  get  started  is 
to  calculate  y^,  y^ , and  y^  by  Runge-Kutta. 

One  attractive  feature  of  predictor-corrector  methods  is  the  ease  with  which 
one  gets  an  estimate  of  the  step  by  step  error.  The  error  for  (4.5)  is 


14  .5  (v>. 

45-  h y 

and  that  for  (4.7)  is 

1 .5  (v)  , 

~ 90  h y U)' 

(c) 

provided  that  one  has  values  of  y'  , and  y , that  satisfy  both  of  (4.7)  and 
r n+1  n+1 

(1.2).  Then  the  error  in  (4.7)  will  be  about 

,,  1 , (c)  (p), 

(4-8)  " 29  {yn+l  ' yn+l  * 

Actually,  the  value  of  Yn+^  given  by  (4.6)  is  close  enough  to  what  one  would 

get  by  iterating  with  (4.7)  that  the  error  in  y , from  (4.6)  is  approximately 

n+1 

(c) 

what  one  would  get  by  using  Yn+^  from  (4.6)  in  place  of  Yn+^  in  (4.8). 

Since  one  needs  four  consecutive  values  of  y at  equal  step  sizes  for  the 

Milne  method,  a change  of  step  size  (should  it  be  required)  is  not  easy.  If 

one  has  as  many  as  seven  preceding  consecutive  steps  of  equal  length,  one  can 

double  the  step  size  by  picking  every  other  value  from  the  present  and  preceding 

y's.  However  this  would  require  recovering  the  values  of  y r,  y ,,  and 

y'  . besides  the  values  that  one  usually  stores.  Or  one  could  carry  on  for 
n-4  1 

three  more  steps  before  doubling,  being  careful  to  save  the  key  values. 


If  one  wishes  to  halve  the  step  size,  one  can  use 
(4.9)  y x - ~(45yn  + 72yn_1  + lly^  + h (-9y^  + 36y;_1  + 3y;_2)> 


(4.10)  y , - — (lly  + 72y  . + 45y  - h(3y’  + 36y'  , - 9y'  ,)} 

3 128  n n-1  n-2  'n  n-1  n-2 
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(see  p.  208  of  Hamming,  1962,  or  (A57)  and  (A58)  on  p.  451  of  Rosser,  1967). 

Unfortunately,  doubling  or  halving  the  step  size  is  often  not  the  most 
efficient  change  to  make.  For  a change  of  a different  size,  one  can  use  inter- 
polation formulas,  of  which  (4.9)  and  (4.10)  are  samples,  but  it  might  be 
simpler  just  to  make  a fresh  start,  generating  the  next  three  values  of  y 
by  Runge-Kutta. 


5.  Adams  predictor-correctors.  For  the  Adams  method  of  order  r , the  predictor 


(which  is  commonly  called  an  Adams -Bashforth  formula)  is 

i \ r-1 

. <P> 


(5.1) 


. , - y + h F a.  y'+Kh 

n+1  n ^ i Jn-i  p 


r+1  (r+1) 

y U>  , 


and  the  corrector  (which  is  commonly  called  an  Adams-Moulton  formula)  is 

r-1 


(5.2) 


y{cJ,  **  y + h J 8.  y'  . - K h 
n+1  ■'n  i •rn+l-i  c 

The  h that  appears  is 

h - x . - x . , 
n+l-i  n-i 

which  is  required  to  be  the  same  for  i * 0,  ...,  r-1. 

The  error  term  on  the  right  of  (5.2)  is  based  on  the  assumption  that 


r+1  y (r+1 ) U) 


(5.3) 


y'n+l  " f(W  yn+i>- 
,(c) 


As  indicated  with  (4.7),  values  of  yR+1  and  y^+1  that  satisfy  both  (5.2) 

and  (5.3)  can  usually  be  found  by  an  iterative  process.  That  is,  if  one 

- (c) 
chooses  a yn+1  that  is  near  the  limiting  yn+1  , and  then  forms 


f(Vi'  W' 


and  substitutes  this  y^+1  for  y^+1  on  the  right  of  (5.2),  one  will  get  a 

(c ) - (c ) 

value  nearer  to  y^+1  than  yn+1  . Repetition  of  this  converges  to  yn+1  • 

However,  this  is  costly  in  function  evaluations,  and  what  is  mostly  done  is  to 

define  r-1 

(5.4)  y£’  ■ yn  * h 80  ‘ <Vl'  ^ 9i  »»l-l  ■ 

after  which  one  takes  yR+1  ■ y^j  . This  holds  the  n unbar  of  function 
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evaluations  to  two  per  step.  (One  may  think  of  the  superscript  "t"  as  standing 
for  "traditional.") 

The  requirements  for  memory  locations  are  one  for  y , r for  y^ 


(i  = 0,...,r-l),  and  one  for  x . Actually,  on  a calculator  like  the  HP-65 , 

n 

one  can  reduce  the  memory  requirements  to  one  fewer  by  judicious  use  of  the  stack. 

Suppose  we  are  at  x , and  have  y , but  have  not  yet  calculated  y‘  . Let  us  have 
n n n 

y^_^  stored  at  memory  location  i , for  i = 1,  . ..,  r-1,  with  y at  location 

r , and  x at  location  r+1 . We  calculate 
n 

r-1 

I a.  y'  . , 

i=l  1 n-1 

and  store  it  in  location  1,  having  for  i = r-2 , r-3,  ...»  1 successively 

stored  y'  . in  location  i+1 . Now  calculate 
n-i 

y'  = f (x  ,y  ) . 
n n n 

Push  an  extra  copy  of  this  up  in  the  stack  to  hold  momentarily.  Now  add  aQ  y* 

to  what  was  stored  in  location  1,  put  the  extra  copy  of  y'  into  location  1, 

n 

and  proceed  with  the  calculation  of  y^|  by  (5.1).  We  now  have  in  storage  or 
in  the  stack  everything  needed  for  calculation  of  y^|  by  (5.4). 

In  the  above,  it  is  assumed  that  the  a's  and  0's  are  part  of  the  program. 
The  a's  and  0's  are  fairly  simple  numbers,  so  that  this  does  not  seem  to 
overload  the  program,  unless  one  has  a very  large  value  of  r , in  which  case 
one  hopes  that  additional  memory  locations  will  be  available  to  store  the  a's 
and  0’s. 

The  Adams  formulas  of  any  order,  with  an  error  term,  can  be  derived  by  use 
of  the  Newton  backward  difference  formula.  Details  are  given  on  pp.  340-342  and 
350-351  of  Conte  and  de  Boor,  1972,  where  the  Adams  method  of  order  4 is  derived. 

A rather  diffuse  explanation  is  scattered  through  a number  of  pages  of  Milne, 

1953,  but  on  p.  50  is  a table  from  which  the  coefficients  can  be  calculated  up 


to  order  9.  Henrici,  1962,  gives  coefficients  of  the  predictors  on  p.  194  and 


of  the  correctors  on  p.  199,  both  up  to  order  6.  Neither  the  tables  of  Milne 

nor  of  Henrici  give  the  error  terms,  but  these  can  be  derived  easily  by  taking 

y = xr  1 in  (5.1)  and  (5.2),  which  will  give  the  values  of  K and  K . 

P c 

Correctors  up  to  order  9,  with  error  terms,  can  bt  got  by  a trivial  modifica- 
tion of  formulas  (A2 ) , (A4),  (A7) , (A12),  (A20),  (A26) , (A34),  and  (A42)  on 

pp.  446-449  of  Rosser,  1967. 

In  order  to  get  started,  we  need  the  r values  ^’o'  yi'  V , • 

For  the  Adams  method  of  order  two,  we  have  the  predictor  and  corrector 

3 


(5.5) 


(5.6) 


yn+l  = yn  + b3K  ~ K-l]  +TT*"U) 


y ^ = Y + — (y  * + y • ) — — — y ( £ ) 

yn+l  yn  2 yn+l  yn  12  y ^ 


We  recognize  (5.6)  as  the  trapezoid  rule.  As  noted  for  the  general  case,  we 
define 

(P). 


(5.7) 
and  take  y 


(t)  ^ h <e< 

yn+l  = yn  + 2 f Xn+1 ' yn+l 


> + y1 ) 
n 


n+1 


(t) 

^n+l  ' 


As  y^  is  given,  we  need  to  obtain  a value  only  for  y^  to  get  started. 

If  one  wishes  to  avoid  use  of  Runge-Kutta,  the  value  of  y^  can  be  obtained  by 
iterating  with  (5.6).  The  same  applies  if  one  needs  to  make  a restart  after 
changing  the  length  of  the  step. 

While  the  values  of  £ in  the  error  terms  of  (5.5)  and  (5.6)  will  scarcely 

(c) 

ever  be  the  same,  if  y’"  is  slowly  varying  then  the  error  of  y , will  be 

n+1 

about  one  fifth  that  of  y^|  if  we  have  iterated  on  (5.6)  until  both  (5.6) 

and  (5.3)  are  satisfied.  In  practice,  the  value  of  y^^  is  close  enough  to 

this  limiting  value  of  that  one  can  say  that  the  error  of  is  about 

n+i  n+1 

(p) 

one  fifth  that  of  yn+^  • In  other  words  the  step  by  step  error  of  the  new 

y , is  about 
Jn+1 

(5-8>  4 > • 
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If  it  were  not  so,  then  one  is  probably  using  too  largo  a value  of  h 
One  is  tempted  to  try  for  more  accuracy  by  defining 

(5.9) 


(e)  (t)  1 , (t)  (p)  . 

Vn+1  “ yn+l  ' 6 yn+l  " yn+l  > ' 


' n + 1 


(e) 

'n+1 


"extrapolated".)  This  does  indeed  seem  to  give  more  accuracy.  If  one  would 


(c)  1 , (c)  (p)  . 

yn+l  - 6 (yn+l  - yn+l  > ' 

(c) 

with  the  limitinq  value  of  y . , one  would  have  a third  order  method.  See 

J In+1  ' 

Ralston,  1965,  p.  186.  As  y is  close  to  \ , one  has  close  to  a third 

1 Jn+1  n+1 

order  method.  So  one  should  have  more  accuracy,  but  one  lias  no  way  to  tell  how 
much  more;  ttiere  is  no  way  to  estimate  the  step  by  step  error.  Indeed,  as 
we  shall  shortly  show,  numerical  examples  disclose  a somewhat  erratic  behavior 


of  y 


(e) 

n+1 


(e) 


The  doctrine  on  use  of  y^+^  is  not  clear.  In  Ralston,  1965,  on  p.  186, 

such  use  is  not  favored.  Indeed,  it  is  there  stated  that  it  affects  the  stability 

properties,  and  this  is  also  affirmed  on  p.  210  in  Hamming,  1962.  However,  in  Hamming, 

1962 , there  is  advocacy  of  more  complicated  schemes  which  make  use  of  an  equivalent 

of  y^°|.  F°r  the  Adams  methods,  which  are  strongly  stable,  the  present  writer 

(e) 

has  found  no  evidence  that  use  of  y causes  stability  to  deteriorate, 

though  it  docs  indeed  for  some  other  types  of  predictor-corrector  methods. 

However,  the  lack  of  any  way  to  estimate  the  step  by  step  error  if  one  uses 

(c) 

y^+^  seems  a strong  point  against  it. 

Since  we  have  brought  up  the  question  of  stability,  wo  should  note  that 
usage  of  the  term  is  not  uniform.  Many  people,  following  Dahlquist,  1956,  say 
that  stability  requires  that  all  roots  of  a certain  difference  equation  should 
be  less  than  unity  in  absolute  value.  In  Hamming,  1962,  on  p.  191,  it  is  pointed 
out  that  according  to  that  definition  no  method  for  solving  y’  " ky  could  lie 
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stable  for  k > 0 . So  Hamming  introduces  (on  p.  191)  the  notion  of  relative 


stability,  in  which  roots  can  bo  larger  than  unity  in  absolute  value  provided 
they  are  less  in  absolute  value  than  a certain  selected  root.  In  Definition  5.2 
on  p.  176  of  Ralston,  1965,  we  find  Ralston  taking  this  as  the  definition  of 
stability;  what  Hamming  calls  "relative  stability"  is  called  "stability"  by 
Ralston.  Wo  concur  with  Ralston  in  our  use  of  the  word  "stable."  If  a method 
is  stable  in  this  sense,  one  can  safely  carry  out  a numerical  integration  of 
indefinite  extent  by  moans  of  it. 

It  turns  out  that  to  achieve  stability  for  predictor-corrector  methods, 
one  must  maintain  c«' rt  a in  bounds  for 


(5.10) 


h 


3f (x,y) 

3y 


Hie  same  is  claimed  to  lx?  true  for  some  Runge-Kutta  methods.  See  Shampine  and 

Watts,  1977,  p.  270.  Writers  of  texts  on  numerical  analysis  have  been  very  lax 

about  calling  this  to  the  attention  of  their  readers,  and  most  people  probably 

harbor  the  illusion  that  Runge-Kutta  methods  are  stable  under  all  circumstances. 

Fortunately,  the  Runge-Kutta's  of  orders  two,  three,  and  four  given  in  this  paper 

happen  to  bo  relatively  stable  (in  the  sense  of  Hamming)  in  all  circumstances , 

and  hence  give  no  trouble  about  stability.  Far  and  away  most  people  who  use  a 

Runge-Kutta  use  oncof  these  three,  which  is  probably  why  almost  no  cases  of 

instability  have  been  reported  when  using  Runge-Kutta . 

A discussion  of  why  one  needs  to  set  bounds  for  (5.10)  to  avoid  instability 

is  given  in  Hamming,  1962,  on  pp.  189-190  and  in  Ralston,  1965,  on  pp.  169-178. 

Certainly,  when  one  is  using  predictor-corrector  methods,  one  should  make 

routine  estimates  of  3f(x,y)/3y  to  check  whether  (5.10)  is  remaining  in 

bounds.  A numerical  example  of  what  can  happen  if  one  fails  to  do  this  will 

be  given  in  Section  6.  If  one  has  two  values  y and  y + e . and  r is  fairly 

small , then  we  get  a good  estimate  by 

(5  li)  3f(x>y)  a f(x,  y+f)  - f(x,  y) 

3y  c 


1 
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and 


Por  the  calculations  proposed,  we  can  take  x *=  xn  + ^ , y + e •»  , 

y *■  y^+1  ' T^lus  we  can  9et  our  estimate  by  (5.11)  without  having  to  perform  any 

additional  function  evaluations,  since  f (x  ,,  y^?)  must  bo  calculated  to 

n+1  n+1 

provide  the  value  of  y^+^  required  in  the  predictor  for  the  next  step. 

The  reason  that  y^?  is  close  to  the  y^C?  that  one  would  get  by  iter- 
n+1  n+1  3 1 

ating  the  corrector,  (5.6),  is  because  y^P|  is  already  fairly  close  to  that 

(c)  ..  , (p)  , (t)  (c)  (p) 

y ; the  use  of  y , in  (5.7)  makes  y , even  closer  to  y , than  y , 
Jn+1  n+1  n+1  n+1  'n+1 

If  we  could  find  a still  closer  approximation  than  to  use  in  (5.7),  we 

n+1 

could  do  still  better.  A closer  approximation  is 


(5.12) 


(p)  + 


n+1 


, (t)  (p). 

(yn  + l ' W' 


(t) 


since  this  is  Obviously  we  cannot  use  this  until  we  have  calculated 


(t) 

yn+l  ' 


However,  with  h fairly  small,  y^^  - y^P^  does  not  vary  hugely  from 


step  to  step.  So  if  we  define 


(5.13) 


(a)  (p)  A , (t)  (p). 

. » v . + (y  - y^  ) , 


'n+1 


'n+1 


then  y^a]  should  be  closer  to  y^C?  than  y ^P ' . (We  may  consider  the  "a" 
n+1  n+1  n+1 

as  standing  for  "adjusted.")  This  suggestion  is  made  at  the  bottom  of  p.  201  of 
Hamming,  1962,  but  its  use  is  there  discouraged  although  later  a slight  varia- 
tion of  it  is  strongly  advocated.  (We  shall  discuss  this  Hamming  variation  later.) 
(a) 


(5.14) 


If  we  use  yn+1  as  an  improved  predictor,  we  would  define 

(a), 


C?  ■ *»  * ? <I6w  »„«>  * »:>  , 


and  then  use  y^f^  for  y , instead  of 

n+1  Jn+1  n+1 

(ta) 


(t) 


Since  y"T'  is  closer  to  y^CJ  than  y^l  or  y , it  would  be  better 
n+1  n+1  n+1  n+1 


to  define  y by 
n+1 


(5.15) 


- v(p)  + (y(ta)  - y.(p)) 


yn+l  ' yn+l 


■n  'n 

Actually,  there  is  trouble  getting  started  with  (5.15).  If  one  has  only  y 
and  y^,  there  is  no  way  to  get  y^P^ , and  so  (5.15)  cannot  be  used  to  get  y^®^, 
W«  will  discuss  this  later.  However,  once  one  has  got  started,  we  will  define 
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(a ) 

yn+1  by  (5.15)  rather  than  (5.13),  and  use  this  in  (5.14). 

Though  yn(>1)  is  closer  to  y^j  than  y^|  is,  it  is  not  necessarily 


n+1 


closer  to  the  true  value  of  y , . Indeed,  numerical  results  which  will  he 

n+i 

given  shortly  seem  to  support  the  proposition  that  use  of  y^+^  rather  than 

y^|  results  in  somewhat  poorer  results  about  half  of  the  time.  However  it 

results  in  appreciably  better  results  the  other  half  of  the  time.  Also,  its 

behavior  is  quite  a bit  less  erratic  than  for  y . So,  on  the  whole,  it  seems 

n+1 

a good  idea. 

Tt»  implement  this,  we  would  have  to  allocate  an  extra  memory  location  to 
store  y(ta)-y  (p)  for  use  in  calculating  y^aj  for  the  next  step.  If  one  is 


(a) 


pressed  for  memory  space,  one  may  have  to  forego  use  of  y^+^.  Also,  there  is 
difficulty  about  calculating  the  first  instance  of  y^ta\  ono  bas  )’0 

given,  and  y^  estimated  by  some  means,  one  can  follow  the  suggestion  of 
Hamming,  1962,  on  p.  206,  that  one  would  simply  set  y^^  = y^ ^ • That  is,  in 

(ta)  . (P>  _ „ 

c amnnnf  c f { e f un  nco  % » ^ ^ 


(5.15),  we  take  y 


y^'*  ■=  0 . Wiat  this  amounts  to  is  that  we  use  y^ 


as  an  approximation  for  y^  • It  may  indeed  be  as  good  an  approximation  for 
y2  as  we  had  earlier  got  for  y^  . Once  we  have  done  this,  we  can  continue 
on  by  (5.14)  and  (5.15).  Alternatively,  one  can  generate  an  approximation  for 
y2  by  the  same  means  that  we  got  an  approximation  for  y^  . Then  we  can  get 
y^a^  by  (5.13),  which  is  close  to  (5.15),  after  which  one  can  use  (5.15)  for 
subsequent  steps. 

It  is  again  tempting  to  try  for  more  accuracy  by  defining 


(5.16) 


(ea)  (ta)  _ 1 (ta)  (p), 
yn+l  yn+l  6 'yn+l  yn+l'  ' 
(ea) 


and  then  taking  yn+1  to  be  y^+^  • This  has  the  same  advantages  and  dis- 

(e) 

advantages  as  using  yn+1  # but  appears  to  be  less  erratic. 

If  we  are  taking  y to  be  y^ta\  then  the  two  function  evaluations 
n+1 

that  would  be  available  to  put  on  the  right  side  of  (5.11)  would  be  with 
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! 


Q J | 0 J 

y + e = y , and  y = y , • For  the  second  order  Adams  method  that  we  are 
Jn+1  n+1 

considering,  this  should  work  well.  However,  in  Hamming,  1962,  on  p.  207, 

concern  is  expressed  that,  for  a high  order  Adams  method,  y^®^  and  y^®j 

might  be  so  close  toe jther  that  e and  the  numerator  on  the  right  of  (5.11) 

could  be  appreciably  altered  by  round  off  error,  so  that  use  of  (5.11)  would 

give  a poor  estimate  of  3f(x,y)/3y.  Ctie  should  not  be  working  close  enough  to 

the  bounds  for  (5.10)  that  a sharp  estimate  for  3f(x,y)/3y  is  required. 

However,  one  should  investigate  the  round  off  properties  of  his  calculator  to 

see  if  there  is  danger  of  a really  poor  answer  from  (5.11).  If  there  is,  one 

will  occasionally  have  to  use  an  extra  function  evaluation  to  get  an  extra 

value  f(x^+1,y)  from  which  to  estimate  3f(x,y)/3y  by  (5.11). 

To  give  some  assessment  of  the  merits  of  the  preceding  procedures,  we  have 

calculated  the  relative  errors  that  would  result  at  x = 6.  This  has  been  done 

2 

for  the  three  equations  y'  = y , y'  = -y,  and  y'  «=  -2xy  , all  with  y(0)  = 1, 

2 

and  for  three  cases  h = 0.1,  0.2,  and  0.3.  Needless  to  say,  for  y'  = -2xy  , 

2 -l 

the  solution  is  y = (1+x  ) . The  row  labelled  R-K  in  Table  1 gives  the 

result  of  the  Runge-Kutta  of  order  two  that  we  discussed  earlier.  The  rows 

labelled  "t",  "e",  etc.  refer  to  the  cases  where  y^*^ , y^®^ , etc.  are  used 

for  y . The  rows  labelled  "tH"  and  "eH"  will  be  explained  shortly. 

Reference  to  Table  1 will  verify  some  comments  which  were  made  before.  The 
(e) 

behavior  of  Vn+^  is  distinctly  erratic,  the  most  extreme  case  being  for 

y'  = y and  h = 0.3,  where  y^|  gives  a poorer  result  than  y^^.  It  will 

be  noted  that  results  using  y ^ are  poorer  than  y ^ for  y'  = y and 

n+1  n+1 

better  for  y'  * -y,  but  on  the  whole  less  erratic  than  for  y^j  . 

(H ) 

In  both  Hamming,  1962,  and  Ralston,  1965,  there  is  advocacy  of  using  V'n4l 

rather  than  y ^®? , where  we  define 
n+l 


(5.17) 


ci  - ci  * i <vr  - 
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TABLE  1 


Relative  error  at  x = 6 for  second  order  Adams  . 


9.24  « 
-3.65  x 
-4.88  x 
-4.67  x 
7.10  x 
-3.26  x 
-1.47  x 


3.39  x 
-1.02  x 
-1.83  x 
-1.69  x 
5.09  x 

-1.73  x 
-5.11  x 


6.96  x 10 

1.47  x io 
3.73  x io 
3.31  x io 
1.54  x io 
3.40  x io 

4.47  x io 


y'_-  -2x^ 
-1.52  x io' 


-4.16  x io 


-1.19  x io 

9.89  x io' 

2.90  x io" 
3.88  x io' 
2.98  x io" 

-2.64  x io" 
-1.85  x IQ- 


having  defined 


(tH)  ^ h . (H).  j , . 

~ yn  I f *n+l ' W + K]  1 


(5.18)  'n+l 

n+l  ii  ^ 

we  take  Vn+^  to  ^’n+1^  ’ (Tbe  superscript  "H"  stands  for  "Hamming. ") 
This  appears  at  the  appropriate  place  in  Table  1.  Hamming  further  proposes 
taking 

/c  (eH)  (tH)  1 . (tH)  (p), 

(5.!9)  yn+1  = yn  + 1 - - <yn+1  - Y^) 

and  using  y for  y , though  Ralston  frowns  on  this  (see  p.  186  of 

n+l  n+l 

Ralston,  1965).  The  results  of  this  also  are  shown  in  Table  1. 

Hamming’s  rationale  seems  to  be  as  follows.  Subtracting  y 


(P) 

n+l 


from  both 


sides  of  (5.19)  gives 
(eH) 


(5.20) 

So  we  can  write  (5.17)  as 


<P> 


5 . (tH)  (p), 
yn+l  " yn+l  = 6 yn+l  " W * 


(5.21) 


(H)  (p)  . (eH)  (p) 

yn+l  = yn+l  + yn  ' yn  > 


If  there  is  very  little  change  of  y^cH^  - y^  from  n to  n+l,  then  wo  are 

n n 


, . ..  (H)  . . (eH) 

very  nearly  taking  V'n  + ^ to  be  y^^ 


1972,  we  "mop  up"  the  error  of  y 


(P) 
n+l ' 


. As  expressed  on  p.  206  of  Hamming, 

Indeed,  later  on  the  same  page  there  is 
(H) 


the  suggestion  that  one  might  just  take  y to  be  y , , and  so  cut  down 

the  number  of  function  evaluations  to  one  per  step.  Presumably  one  still  has  to 
go  through  the  evaluation  of  at  each  step,  to  have  it  available  for  use 

in  (5.21)  at  the  next  step,  but  for  y^  + ^ in  the  next  predictor  we  would  use 


f (x 


(H). 

Y_  . , 1 • 


n+l'  yn+l 

Actually,  one  cannot  quite  hold  it  down  to  one  evaluation  per  step,  since  one 
will  occasionally  need  two  evaluations  in  a step  for  use  in  (5.11).  Incidentally, 
Hamming  does  not  make  an  analysis  of  the  stability  of  this  proceeding.  It  has 
the  disadvantage  that  there  is  no  way  to  estimate  the  step  by  step  error. 

Actually,  if  (5.19)  is  to  give  a third  order  method,  then  y^*^  should 


really  bo  y 


(c) 


(H) 


n+^  • For  this,  wo  should  like  Yn+j  in  (5.18)  to  be  as  close  as 
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possible  to  yn+1  . But  the  definition  (5.17),  being  equivalent  to  (5.21), 

is  liable  to  make  y^H]  closer  to  y*C^  than  to  y , and  the  earlier  pro- 
n+1  n+1  n+1  r 

cedure  of  using  y^^  would  seem  preferable.  However,  according  to  Table  1, 

^n+l^  seems  to  do  a shade  better  than  . The  very  erratic  value  at 

y'  = y,  h = 0.3  for  y^®^  is  disquieting. 

(tH) 

If  one  just  stops  at  yn+^  » and  takes  this  to  be  y^+^,  instead  of  going 


on  to  y 


(this  is  what  is  proposed  in  Ralston,  1965,  on  p.  189),  it  seems 


better  to  use  the  simpler  Vn+^  • From  what  numerical  evidence  we  have, 

^n+1^  *s  *3etter  y^^  in  half  of  the  cases,  and  worse  in  half. 

To  get  started,  we  calculated  y^  and  y^  from  the  known  solutions  of 
the  equations,  and  then  (as  suggested  earlier)  used  (5.13)  to  calculate  y^a^ . 

If  we  use  these  methods  to  solve  y'  = k y , we  get  stability  in  the  ranges 
shown  in  Table  2.  This  means  that  for  stability  the  bounds  shown  in  Table  2 must 
be  satisfied  by  (5.10).  Ihe  fact  that  one  has  stability  for  all  positive  values 
is  surprising.  This  fact  is  not  particularly  useful,  since  for  large  hk  the 
step  by  step  errors  would  be  so  large  as  to  render  the  method  of  little  value. 

TABLE  2 

Stability  ranges  for  y'  = ky  for  Adams  second  order. 


t 

-0.6  < hk 

e 

-0.8  < hk 

ta 

-0.7  <_  hk 

ea 

-0.7  £ hk 

tH 

-0.7  < hk 

eH 

-0.7  < hk 

Observe  that  in  Table  2,  use  of  extrapolated  values  does  not  diminish  the 
region  of  stability.  If  anything,  the  reverse  is  true. 

In  Section  6 we  will  indicate  how  the  values  in  Table  2 were  derived. 
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We  now  turn  to  third  order  methods.  We  will  use  the  Runge-Kutta 


(5.22) 


where 


yn+l  " yn  + 6 (kl  + 4k2  + V ' 


ki  “ h f (xn'yn) 

1 n n . 

. v r / h 1 . 

k2  " h f(Xn  + 2*  yn  + T> 

k3  " h f (*„  + h,  yn  - k1  + 2k2)  , 


see  (5.6-46)  on  p.  199  of  Ralston,  1965,  or  25.5.8  on  p.  896  of  Abramowitz 
and  Stegun,  1964. 

The  third  order  Adams  is  given  by 

(5-2:»  ■ y„  ‘ IT  * IT y<iv>  «> 

<5-24>  Cl  - »„  ♦ & <K*x  * ■ »i-i>  - £ y(ivl  (tl  • 

Analogously  to  the  second  order  Adams,  we  set 

I5-25>  y«J  • y» 4 n- l5f  "w  - K-i' 


(5.26) 


y<e>  . y(t)  . 1_  ( (t)  . (p) 
yn+l  yn+l  10  yn+l  yn+l' 


l5-27>  y»«  ■ » III  4 <yjt*>  - y„'p)| 


(5.28) 


Cl’  - yn  + IT  (5f  (W  yn+l>  + 8yn  " yA-l> 


(5.29) 


(ea)  (ta)  _ l_  ( (ta)  _ (p)} 

yn+l  yn+l  10  vyn+l  yn+l; 


(5.30) 


(H)  (p)  9 . (tH)  (p). 

yn«  ■ Ci  * I?  <y„  - y-  > 


<5-21>  - »„ 4 n-  <!f<w  * % - y;-i> 


(5.32) 


(tH)  1 , (tH)  (p), 

’ yn+l  ' 10  yn+l  " y—  > • 


TABLE  3 


Relative  error  at  x = 6 for  third  order  Adams. 


J 

y' 

= y 

y-  = 

: * 

-y 

y'  = 

-2xy 

h » 

0.1 

R-K 

2.31 

X 

io-4 

2.71 

X 

10~4 

3. 31 

X 

io"5 

t 

-1.50 

X 

10~4 

-3.80 

X 

io"4 

-2.06 

X 

io"5 

ta 

-2.32 

X 

o 

1 

-2.47 

X 

io-4 

-1.82 

X 

io"5 

tH 

-2.24 

X 

io"4 

-2.60 

X 

io-4 

-1.85 

X 

io-5 

e 

5.90 

X 

io~5 

-7.98 

X 

io"5 

-9. 36 

X 

io-7 

ea 

-1.50 

X 

io"5 

3.82 

X 

io~5 

1.31 

X 

io'6 

eH 

-7.  38 

X 

io"6 

2.69 

X 

io"5 

1.04 

X 

io~6 

h « 

EH 

R-K 

1.70 

X 

io~3 

2.35 

X 

io'3 

3.03 

X 

io-4 

t 

-5.96 

X 

10~4 

-4.38 

X 

io‘3 

-1.27 

X 

10~4 

ta 

-1.62 

X 

io'3 

-1.71 

X 

io'3 

-1.76 

X 

io-4 

tH 

-1.51 

X 

io"3 

-1.95 

X 

io"3 

-1.76 

X 

io-4 

e 

8.14 

X 

1 

o 

iH 

-1.49 

X 

10-3 

9.04 

X 

io-6 

ea 

-1.17 

X 

io-4 

8.73 

X 

io"4 

-2.77 

X 

io“5 

eH 

-1.85 

X 

io-5 

6.58 

X 

io-4 

-2 . 78 

X 

io-5 

h ® 

0.  3 

R-K 

5.30 

X 

io~3 

8.56 

X 

10~3 

1.19 

X 

10~3 

t 

-4.58 

X 

io-4 

-2.06 

X 

10~2 

-3.48 

X 

io-5 

ta 

-4.49 

X 

io-3 

-3.70 

X 

io"3 

-8.99 

X 

io"4 

tH 

-4.06 

X 

io'3 

-5.15 

X 

io-3 

-4.05 

X 

io-4 

-A 

3.56  x lo 
1.33  x 10 
2.67  x 10 


■8.84  x 10 
6.04  x lo' 
4.77  x 10 


2.52  x 10 
■4.05  x lo 
•3.97  x lo' 


To  get  started,  we  can  estimate  and  by  some  means  and  then  take 

y^*^  - y (p)  j this  amounts  to  approximating  y by  and  is  what  was  done 

3 n 3 3 

to  get  the  values  in  Table  3.  Alternatively  one  can  estimate  y^  , y^ , and  y^ 

by  some  means,  and  then  take  either 

(a)  (p)  . (t)  (p) 

y4  " y4  + (y3  " Y3  > 

or 

y<«)  . y(P>  + _L  (y(t)  . (P), 

y4  y4  10  'y3  y3  ’ 

subsequently  one  uses  either  (5.27)  or  (5.30). 

2 

Seme  nvmerical  results  are  given  in  Table  3.  The  column  headed  y*  » -2xy 
is  quite  erratic.  This  bears  out  the  remark  at  the  top  of  p.  210  in  Hamming, 
1962,  that  the  corresponding  solution  is  often  troublesome  to  approximate  by 
polynomials. 

For  the  equation  y’  ■ ky,  we  get  stability  in  the  ranges  shovsrn  in  Table  4. 

These  bounds  should  be  satisfied  by  (5.10).  For  the  case  h - 0.3  for 
2 

y1  ■ -2xy  , the  bounds  are  not  satisfied  by  (5.10)  for  a region  near  x - 1. 

TABLE  4 

Stability  ranges  for  y‘  - ky  for  Adams  third  order  . 


t 

-0.8  <_  hk  I 

e 

-0.9  <_  hk 

ta 

-0.5  < hk 

ea 

-0.5  £ hk 

tH 

-0.5  <_  hk 

eH 

-0.5  <_  hk 

However,  we  got  through  the  region  of  instability  in  two  or  three  steps,  which 
were  not  enough  for  the  instability  to  build  up  appreciably.  For  moat  of  the 
range  of  integration,  (5.10)  was  well  within  the  bounds  of  stability. 

As  with  the  second  order  Adams  method,  use  of  extrapolated  values  does  not 
diminish  the  region  of  stability. 
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Tt>  get  some  feeling  how  erratic  some  of  the  values  are  in  Table  3,  note  that 
R-K,  t,  ta,  and  tH  are  supposed  to  be  third  order  methods.  Thus  the  error  for 
h *=  0.2  should  be  8 times  that  for  h = 0.1,  and  the  error  for  h = 0.3  should 
be  27  times  that  for  h = 0.1.  This  works  out  reasonably  well  except  for  t . 

As  pointed  out  on  p.  186  of  Ralston,  1965,  use  of  \ 

yt  f 

tr*\  1 \ t r\\ 

(5. 33) 


(c)  1 , (c)  (p) 

^n+1  “ yn+l  “ 10  yn+l  " yn+l 


should  give  a fourth  order  method.  As  y^f\  and  y^tH^  are  close  to 

n+1  n+1  n+1 

(c) 

yn+1  , use  of  (5.26),  (5.29),  and  (5.32)  means  that  e,  ea,  and  eH  should  be 

close  to  fourth  order  methods.  So  we  look  for  the  error  for  h = 0.2  to  be 
16  times  that  for  h = 0.1,  and  the  error  for  h = 0. 3 to  be  81  times  that  for 
h * 0.1.  None  of  e,  ea,  or  eH  comes  very  close  to  such  behavior.  Nor  can  one 
count  on  a striking  increase  in  accuracy  from  using  extrapolated  values.  In 
two  cases  eH  is  only  barely  better  than  tH,  and  in  one  case  ea  is  poorer 
than  ta  . Considering  that  there  is  no  way  to  estimate  step  by  step  error 
when  one  is  using  extrapolated  methods,  they  should  probably  not  be  used. 

For  the  fourth  order  Runge-Kutta,  consult  (4.3)  and  (4.4),  which  are  gener- 
alized versions.  For  the  fourth  order  Adams  method  (see  Conte  and  de  Boor , 1972, 
p.  342  and  p.  351),  we  set 


(5.34) 

(5.35) 

These  give 


\ ♦ 57  - 59Ci  * ”K-2  - *V3>  * ^vh'’<c> 


y ^ = y + — (9y ' + 1 9y ' - 5y'  + y'  ) - !2]l_  .,  ^ (;  ) 

yn+l  yn  24  l yn+l  yn  y n-1  yn-2 ' 720  * ' 


(5.36) 

y(t) 

yn+l 

“ yn  + 

(5.37) 

(e) 

yn+l 

(t) 
‘ yn+l 

(5.38) 

y (a) 

yn+l 

- y(p) 

yn+l 

(p) 

Wyn+1 


) + 1 9y ' - 5y ' + y'  ,) 

n n-1  n-2 


19  (t)  <p) 

2 70  yn+l  " 'n+l* 


(P!  + (y(ta)  - 


1 


yn‘  > 
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TABLE  5 


Relative  error  at  x = 6 for  fourth  order  Adams  . 


1 

y' 

a 

* y 

y'  - 

"y  1 

y'  = -2xy 

h * 

0.1 

R-K 

4.62 

X 

10-6 

-5.44 

X 

io-6 

-8.10  * 10”7 

t 

-7.65 

X 

io"6 

2.73 

X 

io'5 

9.87  x io”7 

ta 

-1.  39 

X 

10-5 

1.61 

X 

io'5 

1.02  x io'6 

tH 

-1.  35 

X 

io-5 

1.69 

X 

io'5 

1.02  x 10~6 

e 

5.11 

X 

10-6 

7.50 

X 

io-6 

-5.63  x io"7 

ea 

-8.10 

X 

io-7 

-2.88 

X 

io"6 

-5.41  x io"7 

eH 

-4.32 

X 

io'7 

-2.17 

X 

io"6 

-5.40  x io"7 

h = 

0.2 

R-K 

6. 77 

X 

-9.46 

X 

io-5 

-1.37  x io”5 

t 

-2.98 

X 

io“5 

6.93 

X 

io”4 

-5.46  x io-5 

ta 

-1.81 

X 

io-4 

2.18 

X 

io-4 

5.65  x io"6 

tH 

-1.70 

X 

io"4 

2.48 

X 

io-4 

2.11  x io"6 

e 

1.34 

X 

10‘4 

2.94 

X 

io-4 

-4.97  x io"5 

ea 

-7.66 

X 

io-6 

-1.44 

X 

io“4 

4.80  x lo“6 

eH 

2.98 

X 

io'6 

-1.15 

X 

io-4 

1.60  x io”6 

h = 

0.  3 

R-K 

3.16 

X 

io-4 

-5.21 

X 

io-4 

-7.19  x io-5 

t 

1.71 

X 

io-4 

5.30 

X 

io'3 

-9.46  x 10~4 

ta 

-6.85 

X 

io-4 

5.39 

X 

io-4 

5.59  x lo“4 

tH 

-6.20 

X 

io"4 

8.28 

X 

io“4 

4.55  x io"4 

e 

8.43 

X 

io-4 

2.74 

X 

io-3 

-6.92  x io“4 

ea 

3.70 

X 

io"5 

-1.61 

X 

io-3 

6.37  x io'4 

eH 

9.81 

X 

io"5 

-1. 35 

X 

io"3 

5.48  x 10“4 
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(5.39) 


y } “ y + (9f  (x  ,y^a?)  + 19y'  - 5y'  + y'  ) 

n+1  n 24  n+1  n+1  n n-1  n-2 


(5.40) 

(ea) 

yn+l 

(ta)  19  , (ta) 

yn+l  " 270  Vn+1 

(p). 
' yn  +1 

(5.41) 

y<H> 

yn+l 

(p)  ^ 251  , (tH) 

C yn+l  + 270  (yn  " 

(p) » 
yn  > 

(5.42) 

(tH) 

yn+l 

= y + (9f (x  , 

n 24  n+1 

(H), 

W + 

(5.43) 

(eH) 

yn+l 

(tH)  19  , (tH) 

= yn+l  ’ 270  (yn+l 

(p) 
‘ yn+l 

) . 

The  matter  of  getting  started  can  be  handled  as  in  the  second  and  third 
order  Adams  methods. 

Some  numerical  results  are  given  in  Table  5.  The  extrapolated  values  are 

distinctly  erratic,  as  is  also  However,  behaves  fairly  well. 

The  excellent  results  from  the  Runge-Kutta  are  worth  remarking. 

For  the  equation  y'  = ky,  we  get  stability  in  the  ranges  shown  in  Table  6. 

2 

These  bounds  should  be  satisfied  by  (5.10).  For  h = 0.3  for  y'  = -2xy  , 
this  was  not  the  case  for  a short  interval,  not  long  enough  to  cause  any  trouble. 


TABLE  6 

Stability  ranges  for  y'  *=  ky  for  Adams  fourth  order. 


t 

e 

-0.6  < hk 

ta 

EBB 

ea 

-0.4  <_  hk 

tH 

1 1 

eH 

-0.4  < hk 

■m mm 

Use  of  extrapolated  values  does  not  diminish  the  region  of  stability. 
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[ictor-corrcctors . In  Southard  and  Yowcll,  1952,  is  given 

4 

,P]  = -4y  + 5y  + h (4y'  + 2y'  ) + ^r-  y (5) 

i+l  n n-1  n n-1  6 


y = y + — (5y • + 8y 1 — y'  ) - 

yn+l  yn  12  yn+l  yn  yn-l 


h (iv) 


Cne  will  recognize  that  the  corrector  is  the  same  as  for  the  third  order 

Adams,  to  wit  (5.24).  So  the  formula  for  y would  be  the  same  as  (5.25). 

n+1 

We  would  have 

if.  u (e)  (t)  1 . (t)  (p) 

(6’3)  yn+l  = yn+l  ' 5 (yn+l  ' yn+l)  ' 

and  the  other  formulas  analogously. 

This  predictor-corrector  has  some  good  features.  It  is  third  order,  but 

has  modest  memory  requirements,  and  is  easy  to  get  started,  or  get  restarted 

after  changing  the  length  of  the  step.  However,  it  behaves  poorly  as  to 

stability,  as  can  be  seen  from  Table  7 and  Table  8.  The  fact  that  all  positive 

values  of  hk  are  excluded  for  ea  and  eH  is  startling. 

Because  the  range  of  stability  for  is  so  limited,  one  finds  (3.10) 

2 

outside  that  range  in  the  case  h = 0.3  and  y1  = ~2xy  for  an  extended 
period,  and  the  solution  really  blows  up.  Already  at  x = 3,  one  gets  a negative 
value  for  y , after  which  the  errors  compound  catastrophically.  Before  reaching 
x = 6,  the  calculator  stops  on  an  overflow,  because  the  numbers  are  too  large 
for  its  capacity. 

If  one  had  been  monitoring  (5.10),  this  instability  could  have  been  avoided 
by  using  a smaller  h through  the  critical  region. 

Hie  analysis  of  the  stability  for  and  y^|  would  be  worthy  of 

special  attention  by  anyone  interested  in  stability.  Neither  the  usual  Dahlquist 
criterion  of  stability  nor  the  Hamming- Ralston  criterion  for  relative  stability 
is  applicable,  and  a special  definition  had  to  be  contrived.  Never  mind  the 


details.  The  reader  can  trust  that  if  (5.10)  stays  too  long  outside  the  bounds 
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TABLE  7 


Relative  error  at  x » 6 for  Southard  and  Yowell. 


h - 0.3 


2.31  x 10 
-1.64  x lo" 
-2.32  x 10' 
-2.17  x lo" 

1.18  x lo' 

unstable 

unstable 


1.70  x io 
-8.99  x i0- 
-1.63  x 10* 
-1.45  x lo' 
1.70  x lo' 
unstable 
unstable 


5. 30  x 10 


2.71  x 10 
-4.18  x lo" 
-2.48  x io" 
-2.73  x io" 
-1.52  x io" 
9.43  x io" 
5.05  x io" 


2.35  x 10 
-7.16  x io" 
-1.75  x io" 
-2.20  x lo" 
-2.84  x lo" 
1.79  x io" 
1.09  x io" 


8.56  x 10 


3.31  x 10 
-1.50  x 10~ 

-2.01  x lo" 

-2.09  x io" 
-1.82  x lo" 
5.77  x io" 
4.16  x io" 


3.03  x io 
4.54  x lo" 
7.17  x io" 

-1. 34  x lo" 
4.00  x lo" 

4.03  x lo" 
1.05  x lo" 


1.19  x 10 


-2.09  x 10 


-2.79  x io 


unstable 


-4.66  x 10 


unstable 


1.30  x 10 


-4.00  x 10 
7.79  x lo" 


unstable 
-1.72  x io" 


3.45  x 10 
1.59  x io" 


unstable 


unstable 


unst  able 


unstable 


unstable 


1.04  x io 


TABLE  8 


Stability  ranges  for  y'  - ky  for  Southard  and  Yowell. 
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•I 


y 

given  in  Table  8,  there  will  be  trouble;  see  the  case  h = 0.3  for  y'  - -2xy 
for  example. 

In  Hamming,  1959,  on  p.  47,  concern  is  expressed  about  the  stability  of 
Southard  and  Yowell,  and  it  is  proposed  to  remedy  the  situation  by  using  a 
different  predictor 


(6.4) 


(p)  - yn  + v„-i  - y„_,  + 2h<y:  - y:_,>  + V y(iv)(u  . 


n+l  Jn  Jn-1  Jn-2  —'Jn  'n-1'  3 

This  appreciably  increases  the  number  of  memory  locations  needed  and  requires 
more  starting  values  to  get  started  or  restarted,  thus  cancelling  out  the  two 
good  features  of  Southard  and  Yowell.  It  does  appear  to  improve  the  stability; 
for  y^|  it  is  increased  to  -0.5  j<  hk.  However,  this  is  considerably  less 
than  for  the  Adams  method  of  order  three.  If  we  compare  the  Adams  of  order 
three  with  what  results  from  (6.4)  we  find  that  both  are  of  order  three,  both 
need  two  extra  starting  values  to  get  started  or  restarted,  but  the  Adams  is 
considerably  more  stable  and  requires  fewer  memory  locations. 

In  Hamming,  1959,  it  was  proposed  to  render  the  classical  Milne  predictor- 
corrector  stable  by  using  a different  corrector,  namely 

tt-5>  ■ s-  <%  - Vj  * 5h  K,i  * 2K  - Ci”  - £ *lv’ «>  • 

the  old  predictor,  (4.5),  was  retained.  The  results  of  this  are  shown  in  Table 

2 

9 and  Table  10.  In  Table  9 there  is  no  column  y*  = -2xy  because  the  memory 

requirements  exceeded  the  capacity  of  the  HP-65  with  which  the  calculations  of 

this  paper  were  performed.  Although  there  is  a region  of  stability,  it  is  less 

than  for  the  Adams  method  of  comparable  order,  namely  order  four.  All  in  all,  the 

Hamming  method  is  not  a contender  for  use  with  hand  held  calculators. 

From  Ralston,  1965,  one  would  get  the  impression  that  Hamming's  method  is 

very  superior.  It  is  y^j^  that  gets  the  stamp  of  approval  on  p.  189  of 

Ralston,  1965.  Actually,  y^*^  would  be  simpler,  and  gives  about  the  same 

results.  We  do  not  believe  that  y^!^  is  enough  superior  to  y^f^  to  be 

n+l  n+l 
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TABLE  9 


Relative  error  at  x = fc  for  Hamming  . 


y' 

s 

y 

y* 

5 

■y 

h 

= 0.1 

R-K 

4.62 

X 

io"6 

-5.44 

X 

io“6 

t 

-8.63 

X 

IQ’6 

4.55 

X 

io"5 

ta 

-1.68 

X 

io"5 

2.10 

X 

io'5 

tH 

-1.61 

X 

io'5 

2.24 

X 

io'5 

e 

4.24 

X 

io-6 

7.52 

X 

io-6 

ea 

0 

rH 

CD 

1 

X 

io-7 

-2.55 

X 

io~6 

eH 

-4.32 

X 

io-7 

-1.92 

X 

io“6 

h 

= 0.2 

R-K 

6.77 

X 

io'5 

-9.46 

X 

io-5 

t 

-3.87 

X 

H* 

O 

1 

2.37 

X 

io-3 

ta 

-2.07 

X 

io-4 

2.82 

X 

io-4 

tH 

-1.92 

X 

io~4 

3. 35 

X 

io'4 

e 

1.07 

X 

io"4 

3.73 

X 

io"4 

ca 

-8.49 

X 

io-6 

-1.15 

X 

io-4 

eH 

1.60 

X 

io'6 

-9.44 

X 

io-5 

h 

= 0.3 

R-K 

3.16 

X 

io-4 

-5.21 

X 

io"4 

t 

1.05 

X 

io"4 

unstable 

ta 

-7.53 

X 

io-4 

6.78 

X 

io"4 

tH 

-6.75 

X 

io-4 

1.08 

X 

io-3 

e 

6.63 

X 

io"4 

5.90 

X 

io-3 

ea 

2.32 

X 

io-5 

-1.10 

X 

io"3 

eH 

8.02 

X 

io-5 

-9.56 

X 

io"4 

TABLE  10 

Stability  ranges  for  y*  = ky  for  Hamming  . 


t 

-0.26  ^ hk 

e 

-0.  38'  V hk  ”“ 

ta 

-0.  37  _<  hk 

ea 

-0. 38  < hk 

tH 

-0. 38  ^ hk 

eH 

-0.30  hk 
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worth  the  extra  trouble  of  programming  and  the  longer  running  time  that  it 
entails. 


Despite  Ralston's  endorsement  of  Hamming,  Enright  and  Hull,  1976,  give  it  a 

very  low  rating  on  pp.  954-955.  Interestingly  enough,  though  Hamming  invented 

the  method,  and  devotes  a lot  of  space  discussing  it  in  Hamming,  1962,  he 

finally  (in  pp.  206-210)  seems  to  favor  something  related  to  an  Adams  method. 

On  p.  46  of  Hamming,  1959,  the  stability  for  y^c|^  is  claimed  to  be  good 

n+1 

down  to  about  -0.65.  As  this  disagrees  with  the  value  given  in  Table  10,  we 

will  justify  the  value  in  Table  10. 

For  Hamming's  method,  we  have 

(6.6)  y*H*  = y*P^  + — — (y*tH)  - y ^ ) 

v ; yn+l  yn+l  121  'yn  yn  ' 


(eH)  (tH)  9 (tH)  p). 

’n+1  = yn+l  - y— ) * 


21  Jn+1 


Subtracting  y , from  both  sides  of  (6.7)  gives 
n+1 


(eH)  (p)  112  , (tH)  (pk 

Vl  - Yn+1  " 12l  ( Yn+1  - Yn+1 


So , by  (6.6) 


<H)  = <P>  + ( - y(P), 

n+2  yn+2  vy-J-1  ‘ 


n+1  n+1 


As  the  entries  that  are  accepted  are  those  with  a superscript  (eH) , we 
shall  simplify  the  notation  by  omitting  this.  Then  by  (4.5)  and  (6.9),  we  see 
that  for  the  equation  y’  = ky,  we  have 
,,  _ . (H)  , 4hk , _ . . 

(6.10)  yn+2  - (yn_j  . ~(2ynrt  - y„  * 2y„.1l) 


* <*„«  - <V3  * T l2yn  - Vl  * 2V2>» 


This  simplifies  to 


(6.11) 


yn+2  " yn+l  + yn-2  " yn-3  + ~ (2yn+l  " 3yn  + 3Vl  “ 2yn-2K 


We  have,  by  (6.5), 

(6‘12)  yn«’  * S'  (9yn+l  - yn-l  + 3hk  ^"2  + 2yn+l  ' yn)K 


(6-13)  yn+2 ’ = i (9yn+l  " yn-l  + hk  (9yn+l  " 3yn  + 3yn-2  " 3yn-3> 


+ (hk)  (8yn+l  * 12yn  + 12yn-l  ' 8yn-2)K 


By  (6.7) 


,,  ...  112  (tH)  . 9 (p) 

yn+2  121  yn+2  + 121  yn+2  * 


By  (4.5)  and  (6.13),  this  gives 


(6-15)  yn+2  = 121  (126yn+l  " 14yn-l  + 9yn-2 


+ hk  (150y  , - 54y  + 24y  + 42y  - 42y  ,) 

n+1  •'n  n-1  n-2  n-3 

+ (hk) 2 (112y  . - 168y  + 168y  , - 112y  „ ) ) . 

n+1  •'n  n-1  'n-2 

“Hie  solution  of  this  difference  equation  is 
5 

(6.16)  y - l C pn 

i*=l 

where  the  are  the  roots  of  the  equation 

(6.17)  121p5  - (126  + 150hk  + 112(hk)2)p4  + (54hk  + 168(hk)2)p3 

+ (14  - 24hk  - 168  (hk) 2 ) p2  - (9  + 42hk  - 112(hkf)p  + 42hk  = 0 
If  we  take  hk  «■  -0.4,  we  get 

(6.18)  121p5  - 83.92p4  + 5.28p3  - 3.28p2  + 25.72p  - 16.8  - 0 . 

The  polynomial  has  the  factors 


(6.19) 


p - 0.67053  10701 


(6.20) 


p + 0.92764  94248p  + 0.45383  23357 


(6.21) 


p - 0.95067  20739p  + 0.45625  70288. 


Die  zeros  of  (6.20)  have  absolute  value 


0.67367  07918 
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and  the  zeros  of  (6.21)  have  absolute  value 


0.67546  80072  . 

For  large  n , the  powers  of  these  quantities  will  predominate,  and  the  values 
of  will  jump  about  very  erratically. 

For  hk  = -0.39,  the  factor  corresponding  to  (6.19)  will  produce  the 
dominant  p,  and  (6.16)  will  approximate  e , as  it  should. 

Amongst  the  disadvantages  of  the  Hamming  method  for  hand  held  calculators 
is  its  excessive  requirement  for  memory  locations.  To  improve  thir,  one  could 
change  the  predictor  to 

(6.22)  ,£>  - -9yn  * 9yn_x  ♦ y^  ♦ h By^  . ^ . £ y‘V>  ({)  . 

Wien  used  with  the  Hamming  corrector,  (6.5),  this  still  gives  a fourth  order 

method.  However,  it  requires  two  fewer  memory  locations.  Also,  it  requires 

fewer  starting  values  to  get  a start  or  restart,  being  in  this  respect  also 

superior  to  the  Adams  method  of  order  four. 

This  predictor  is  one  of  a set  which,  in  a footnote  on  p.  171  of  Hamming, 

1962,  is  dismissed  as  not  being  worth  consideration.  The  combination  of  (6.22) 

and  (6.5)  does  turn  out  to  have  very  poor  stability  characteristics.  For  y^j 

one  must  have  -0.13  < hk,  and  for  y^f^  one  must  have  -0.19  < hk  < 0.28. 

— n+1  — — 

At  this  point,  we  became  disheartened,  and  did  not  check  out  the  other  cases. 
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7.  Conclusions . Despite  heroic  efforts  to  remedy  various  faults  of  the 


Milne  type  predictor-correctors,  they  are  still  decidedly  inferior  to  the 
Adams  type.  One  would  prefer  if  the  Adams  type  results  were  less  erratic 
than  they  are,  but  if  the  user  is  diligent  t.o  keep  (5.10)  within  bounds,  it 
appears  that  one  can  use  the  Adams  methods  safely,  and  without  running  into 
excessive  calculation  times.  An  occasional  "look  ahead"  to  plan  the  progress 
of  the  calculation  is  advisable. 

Runge-Kutta  methods  have  many  advantages  and,  if  it  happens  that  f (x,y) 
can  be  evaluated  quickly,  they  should  be  given  serious  consideration. 

Although  the  three  Runge-Kutta's  given  in  this  paper  are  relatively  stable 
under  all  circumstances,  so  that  one  can  use  them  without  any  concern  about 
stability,  one  must  still  monitor  the  size  of  h carefully,  since  if  one 
allows  the  step  by  step  errors  to  become  excessive  the  final  results  can  be 
almost  without  meaning. 
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THE  MECHANICAL  TRAIN  ANALOG 
A PROPOSED 

SOFTWARE  EVALUATION  TOOL 


Peter  Beck 
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US  Army  Armament  R&D  Command 
Product  Assurance  Directorate 
Dover,  New  Jersey  07801 


ABSTRACT  - The  current  method  of  evaluating  and  qualifying 
software  is  accomplished  by  having  a panel  of  experts 
exercise  the  software  system.  This  is  a nonanalytlcal , 
subjective  procedure,  since  it  is  impossible  to  expose  a 
specific  software  system  to  a sufficiently  widespread 
scrutiny,  by  sufficiently  diverse  interests,  to  assure  that 
enough  significant  points  of  potential  weakness  have  been 
exposed.  This  shortcoming  of  existing  software  evaluation 
tools  (SET)  is  the  reason  for  proposing  the  mechanical 
train  analog  (MTA) . 

1.  INTRODUCTION 

The  automated  battlefield  is  a new  and  unique  commodity 
where,  by  definition,  the  functioning  is  controlled  by 
devices  that  replace  the  human  elements  of  observation, 
effort,  and  data  reduction.  This  system  has  two  major 
elements:  hardware  and  software.  Hardware  product 
assurance  is  reasonably  well  understood,  while  for  the 
associated  software  system  product  assurance  is  an  emerging 
specialty. 

The  software  problem  in  need  of  clarification  is  how  to 
translate  the  user/customer  quality  notions  of  the  software 
into  quantitative  terms  (i.e.  scoring  criteria);  and  to 
describe,  in  the  broadest  sense,  those  activities  that  must 
be  performed  by  the  manufacturing  organizations  to  assure 
uniformity  of  the  product  and  customer  satisfaction.  The 
method  of  quantifying  the  customers'  quality  notions  must 
proceed  according  to  a formalized  methodology  consistent 
with  the  scientific  method,  if  customer  acceptance  is  to  be 
achieved.  This  methodology  is  commonly  called  product 
assurance.  The  product  assurance  task,  then,  is  to 
Identify  and  quantify  customer  safisfactlon  measures  for 
all  phases  of  the  life  cycle.  Thus,  the  question  of 
scoring  criteria  for  initial  qualification,  replication, 
and  followon  procurements  must  be  addressed.  The 
preliminary  evidence  is  that  for  software  systems  the  lack 
of  common  or  at  least  consistent  standards  for  system 
specifications,  test  plans,  and  procedures  (supportive 
manuals,  handbooks,  and  guides)  contributes  to  confusion 
and  a lack  of  understanding  of  the  scoring  criteria 
applicable  to  the  software  system. 
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How  can  we  tell  when  the  product  assurance  discipline 
is  sufficiently  developed  to  assure  customer  satisfaction 
with  a software  system?  Lord  Kelvin  said  "When  you  can 
measure  what  you  are  speaking  about,  and  express  it  in 
numbers,  you  know  something  about  it;  but  when  you  cannot 
express  it  in  numbers,  your  knowledge  is  of  a meager  and 
unsatisfactory  kind.  It  may  be  the  beginning  of  knowledge, 
but  you  have  scarcely,  in  your  thoughts,  advanced  to  the 
stage  of  science,  whatever  the  matter  may  be."  This  desire 
to  have  standarized  quantitative  measures  is  still  valid 
today.  It  assures  widespread  understanding  of  the 
commodity  and  therefore,  has  the  potential  for  achieving 
widespread  customer  satisfaction. 

2.  MISSION  AREA  RESPONSIBILITIES 

It  is  the  responsibility  of  the  Selected  Armaments 
System  Division  (SASD)  to  perform  for  mine  systems  the 
following  tasks,  which  include  mines  and  mine  dispensing 
systems:  to  design,  develop,  and  evaluate  test  programs 
for  determining  the  safety  and  reliability,  availability 
and  Maintainability  (S/RAM)  engineering  characteristics  for 
demonstration  of  the  system  and  subsystems'  achievements. 
In  addition,  SASD  performs  the  required  safety  engineering 
functions  including  the  safety  statement  for  these  same 
materials. 

The  first  mine  system  integratable  into  the  automated 
battlefield  is  currently  under  development  and  is  called 
the  Family  of  Scatterable  Mines  (FASCAM).  The  introduction 
of  FASCAM  into  the  Army's  inventory  will  provide  the  field 
commander  with  a revolutionary  capability,  e.g., 
the  use  of  mine  field  barriers  as  a tactical  weapon  if  the 
compliance  with  the  safety  and  RAM  requirements  can  be 
adequately  demonstrated  to  the  user.  In  utilizing  mine 
fields  in  this  new  role,  however,  we  must  first  be  aware  of 
the  extensive  safety  precautions  and  techniques 
traditionally  employed  to  control  mine  fields  seeded  with 
conventional,  nonscatterable  mines.  Then,  we  must 
recognize  the  need  to  address  new  and  better  techniques  for 
achieving  a comparable  or  improved  level  of  control  and 
command  over  mine  fields  that  are  no  longer  handemplaced, 
but  remotely  scattered  and  sown  in  large  quantities  from 
high  speed  delivery  systems  at  very  rapid  rates.  The 
consideration  of  these  command  and  control  aspects  of 
scatterable  mines  has  led  to  the  formulation  of  a concept 
which  has  been  given  the  hypothetical  designation  of 
Minefield  Information  System  (MIS). 


To  permit  the  greatest  possible  utility  of  the  FASCAM 
mining  systems  by  the  field  commander,  the  MIS  shall  be 
comprised  of  an  interactive  graphics  display,  automatic 
data  processing  equipment,  computer  programs,  and 
appropriate  communications  links.  The  MIS  will  facilitate 
the  commander's  decision  making  by  providing  him  with  upto- 
theminute  intelligence  processed  in  realtime  from  all  data 
vital  and  pertinent  to  the  conduct  of  mining  operations. 

The  information  processed  by  the  MIS  command  and 
control  center  and  displayed  on  command  at  division 
headquarters  will  include,  in  part:  terrain,  target  and 
weather  information;  a record  of  types  and  quantities  of 
mines  and  delivery  systems  in  the  division's  supply  system 
and  immediately  available  for  deployment;  the  recommended 
number  of  minefields,  minefield  sizes,  mixes  densities  and 
optimum  locations  of  minefields  to  meet  a particular  threat 
and/or  to  accomplish  an  offensive  objective;  exact  times  of 
emplacing  the  various  minefields;  and  an  automatic  armed 
life  countdown  to  facilitate  subsequent  safe  passage  of 
friendly  troops  or  to  facilitate  reseeding  requirements. 
Additionally,  MIS  will  have  the  capability  to  accurately 
map  and  display  minefield  locations  and  will  provide  a 
semi  .utomated  communications  link  between  the  commander  and 
his  tactically  dispersed  minefields  for  the  purpose  of 
complete  command  and  control  of  tactical  minefields. 

An  MIS  functional  flow  chart  for  a command  and  control 
barrier  system  might  be  represented  as  in  figure  1.  Based 
on  intelligence  received  from  the  tactical  operating  system 
CTOS)  or  directly  through  the  army  communications  system 
and  based  on  the  mine  warfare  data  base  stored  in  the 
computer,  the  MIS  will  recommend  optimum  mine  fields  and 
provide  minefield  mapping.  In  addition,  it  will  keep  an 
uptothemlnute  graphically  displayed  status  of  the 
minefields  for  the  field  commander's  use  in  his  overall 
tactical  planning  and,  upon  action  by  the  field  commander, 
the  MIS  command  and  control  center  will  signal  the 
minefields  to  destruct.  Further,  each  critical  MIS 
activity  will  be  recorded  on  a hardcopy  printout  at  the 
division  tactical  operations  center.  Although  the  hypo- 
thetical MIS  function  of  TOS  will  be  capable  of  independent 
operation  (i.e.  only  interfacing  with  TOS),  its  design,  as 
depicted  in  figure  2,  will  provide  for  the  direct 
functional  interface  through  the  field  army  communications 
network  with  other  TOS  subsystems.  This  option  is 
Important  if  the  field  commander  is  to  maintain  a timely 
response  to  mining  operations  under  all  circumstances. 
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It  is  planned  that  the  MIS  will  ultimately  facilitate 
the  integration  of  PASCAM  mining  capabilities  into  ideal 
command  and  control  barriers  for  optimum  use  by  tne  field 
commander  of  the  future  automated  battlefield.  This  on- 
line, real  time  data  processing  system  will  provide 
information  which  is  more  accurate,  more  timely, 
more  complete,  more  concise,  and  more  relevant  to  decision 
making  than  the  present  manual  technique  of  mine  field 
record  keeping  and  marking  maps.  Accordingly,  a safety/ 
RAM  engineering  effort,  which  will  demonstrate  the  safe, 
effective  use  and  control  of  FASCAM  mines  and  their 
associated  quick  response  delivery  systems  in  realtime 
combat  operations  is  imperative. 

3.  PROPOSED  SOFTWARE  EVALUATION  TOOL 

The  complexity  of  future  munition  systems,  such  as 
FASCAM  with  MIS,  has  prompted  SASD's  review  of  software 
reliability  to  assure  data  security,  user  protection  from 
catastrophic  system  failures,  friendly  troop  safety,  and 
the  reliability  of  decision  support  information.  This 
review  indicates  that  there  is  a nonlinear  relationship 
between  the  number  of  assigned  tasks,  with  their  respective 
priorities  and  the  number  of  system  states  possible.  In 
fact,  the  problem  is  inherent  and  is  derived  from  the 
Intricate  structure  of  precedence  constraints  and  the 
complicated  relationships  among  the  execution  times. 
Therefore,  it  can  be  shown  that  the  possible  states  of  a 
useful  software  system  grow  so  rapidly  that  there  is  no 
hope  of  examining  even  a small  fraction  of  them. 

Philosophy  divides  the  understanding  of  truth  into  two 
broad  classes;  the  metaphysical  and  the  physical. 
Metaphysics  is  the  science  of  understanding  based  upon 
intuition,  function,  duration,  i.e.,  getting  inside  of  the 
pnenomenon.  The  physical  understanding  is  usually  called 
analysis  and  has  been  well  defined  by  Henri  Bergson  as  "the 
operation  which  reduces  the  object  to  elements  already 
known,  that  is,  to  elements  common  both  to  it  and  other 
objects.  To  analyze,  therefore,  is  to  express  a thing  as  a 
function  of  something  other  than  itself.  All  analysis  is 
thus  a translation,  a development  into  symbols,  a 
representation  taken  from  successive  points  of  view  from 
which  we  note  as  many  resemblances  as  possible  between  the 
new  object  which  we  are  studying  and  others  which  we 
believe  we  know  already."  It,  therefore,  stands  to  reason 
that  existing  SETs  follow  this  same  philosophical 
classification  where  Monte  Carlo  simulation,  interactive 
gaming  and  deterministic  evaluation  are  metaphysical  in 
nature  (i.e.  you  play  with  the  system  until  you  get  bored) 
and  algorithmic  interrogation  techniques  comprise  the 
analytical  approach.  The  authors  believe  that  the 
analytical  approach  is  the  only  viable  method  of  evaluating 
the  inherent  workability  (correctness)  of  a system.  We 
also  believe  that  mathematical  proofs  of  correctness  are 
not  well  enough  developed  to  have  universally  accepted 
standards.  Therefore,  this  new  analytical  approach 
comprised  of  a standarized  hardware  analog  is  proposed. 
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This  standardized  hardware  analog  is  called  a Mechanical 
Train  Analog.  It  is  a model  railroad  system  (i.e.  a 
miniature  model  train  set)  of  a sufficiently  complex 
configuration  to  perform  software  analysis.  The  synthesis 
of  the  MTA  was  stimulated  by  two  facts:  1)  the  enormous 
number  of  possible  states  and/or  functions  MIS  might 
perform,  and  2)  the  observation  that  80-90  percent  of  human 
cognition  is  achieved  through  visual  perception.  We, 
therefore,  reasoned  that  if  a visual  gage,  the  MTA,  could 
be  developed  to  display  the  cognitive  associations  of  a 
software  system  and  if  comparative  standards  could  also  be 
developed,  the  tools  necessary  for  software  analysis  would 
be  realized. 

4.  MTA  DESCRIPTION 


'Die  search  for  a standarlzed  hardware  analog  led  us  to 
the  world  of  the  model  builder.  It  became  obvious 
immediately , that  the  most  developed  miniaturization  of 
human  activity  is  the  domain  of  tiie  model  railroader.  It 
must  be  emphasized  that  model  railroading  Is  a diversified 
pursuit.  The  traditional  railroader  attempts  to  design  his 
layout  to  include  all  facets  of  wordly  interactions. 
Therefore,  consideration  is  given  to  scenery,  buildings  and 
other  offtrack  activities  of  both  a static  and  dynamic 
nature.  The  nontraditlonal  railroader  is  investigating  the 
model  world  as  a teaching  tool  for  slow  learners,  as  a tool 
for  real  time  computer  evaluation,  as  a dynamic  distributed 
artificial  intelligence  array,  etc.  Tills  broad  based 
experience  with  model  railroads,  by  individuals  of  widely 
differing  backgrounds,  confirmed  our  hunch  of  the 
practicability  of  using  a model  railroad  as  a SET. 


Model  railroad  layouts,  because  of  their  physical 
existence,  are  rather  intolerant  of  human  oversights.  The 
experiment  ex*  must  address  the  problems  of  mixed  hardware, 
series  ;uui  parallel  operation,  and  the  randomness  of  human 
error  and  hardware  malfunction.  The  hardware  mixture  can 
be  as  simple  as  varying  roadbed  design  and/or  locomotives, 
or  as  complex  as  transferring  cargo  from  truck  to  train  to 
truck.  The  multiple  operating  subsystems  that  comprise  a 
given  layout  may  Include  switch  yards,  subway  systems, 
trolleys,  automated  loading  of  coal  cars,  etc.  The 
diversity  of  hardware  has  caused  many  human 
misunderstandings  and  maintenance  di I fieul ties  lot  complex 
model  layouts,  that,  require  more  than  one  person  for  model 
operation.  This  imperfect  coordination  is  the  human 
element  of  the  MTA  and  is  of  value  in  evaluating  the 
robustness  of  our  analysis. 
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In  addition  to  the  realistic  hardware  complexity  model 
railroads  encompass  an  extensive  supporting  model  railroad 
infrastructure  is  also  available.  This  infrastructure 
includes  the  standardization  of  scale  sizes,  the 
availability  of  hardware  manufactured  to  these  standards, 
magazines,  clubs,  contests,  conventions,  system  review 
standards,  and  an  everrenewing  source  of  trained  and 
skilled  experimenters.  Therefore,  we  can  see  that  the  MTA 
can  serve  as  a visual  evaluation  gage  where  the 
Inspector/evaluator  can  easily  see  if  one  operation  Is 
better  than  another.  In  addition,  since  the  MTA  operation 
Incorporates  the  software  operation  that  controls  It,  an 
analytical  judgement  of  the  quality  of  the  software  under 
test  is  available. 


5.  MTA  APPLICATION 

In  the  hope  of  clarifying  the  MTA's  usefullness  for 
evaluating  a system,  a typical  application  will  be 
addressed.  The  example  dicusses  some  parallels  between  an 
electric  utility  and  the  MTA.  The  example  also  discusses 
how  the  MTA  might  be  used  to  evaluate  a specific  electric 
utility  performance  criteria.  The  reader  should  keep  in 
mind  that  this  illustrative  application  has  not  been  run  on 
the  MTA  and,  therefore,  is  not  a complete  procedure. 


A cursory  review  of  electric  utilities'  terminology 
uncovers  a similarity  with  railroad  terminology.  The 
primary  elements  are  generating  stations , both  major  and 
minor;  power  lines  interconnecting  the  stations  and 
customers  (via  substations  and  feeder  lines ) ; and  switching 
complexes  which  facilitate  changing  the  systems'  routing. 
The  utility  system  is  stressed  by  equipment  malfunction, 
weather,  customer  demand,  cost,  maintenance  schedules  and 
other  time  varying  events.  The  above  parallels  in 
terminology  between  the  MTA  and  an  electric  utility 
Indicate  that  each  are  composed  of  similar  elements:  i.e. 
fixed  assets  which  can  be  connected  at  terminals  or 
switching  stations  via  Interconnecting  lines  to  provide  an 
almost  infinite  variety  of  system  configurations. 


The  electric  utility  performance  criteria  we  have 
chosen  to  evaluate  is  the  prevention  of  the  catastrophic 
system  crash  called  a blackout.  A simple  definition  of  a 
blackout  is  a system  wide  malfunction  which  physically 
damages  the  system  and  also  requires  a significant  time  to 
restore  service  to  the  customers.  Because  of  the  costs 
involved,  blackout  prevention  cannot  be  assured  by  system 
over  capacity  and/or  redundancy.  Therefore,  most  utilities 
have  developed  load  shedding  schemes  to  prevent  the 
malignant  spread  of  system  malfunctions.  This  scheme  has 
the  intended  purpose  of  containing  and  isolating 
malfunctions.  The  MTA  can  be  used  to  visualize  how  the 
translated  electric  utilities  system,  including  its  load 
shedding  scheme,  are  affected  by  random  malfunctions,  one 
analysis  that  might  be  performed  is  a hardware  failure 
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modes,  effects  and  criticality  analysis  where  every  MTA 
element  incorporated  in  the  translated  electric  utility 
system  is  intentionally  malfunctioned  and  its  effect  on  the 
system  is  noted.  It  will  be  immediately  obvious  which 
malfunctions  propagate  throughout  the  system  causing  a 
total  stoppage  of  trains  (blackout).  It  will  allow  us  to 
investigate  the  effects  of  fault  Isolation  schemes  and  of 
the  procedures  used  to  restart  the  system. 

Therefore,  It  can  be  seen,  that  the  MTA  will  perform 
the  task  of  a gage,  displaying  the  entii’e  dynamic  system 
under  study:  hardware  and  software  defects  will  show  up  as 
nonnormal  system  functioning.  Experience  has  shown  that 
the  human  mind  and  eye  can  detect  iri’egularlties , when  the 
entire  systems  interrelationships  are  observed,  that  are 
normally  not  statistically  noteworthy  and,  therefore,  are 
considered  insignificant.  This  ability  to  watch  the  system 
run  will  allow  us  to  find  bottlenecks  and  critical  paths 
which  may  have  the  potential  for  causing  system 
instabilities  leading  to  catastrophic  system  crashes. 

6.  MTA  INVESTIGATION  PLAN 

The  intuitively  appealing  hypothesis  of  the  MTA  must 
find  a mode  of  expression  and  of  application  which  conforms 
to  the  habits  of  accepted  thought,  and  one  which  furnishes 
us,  in  the  shape  of  well  defined  concepts,  with  solid 
points  of  support  if  it  is  to  be  accepted.  This  procedure 
of  review  and  analysis  has  three  primary  interrelated 
tasks:  The  first  task  is  to  develop  a standardized 
translation  procedure  to  assure  that  all  problems  of  the 
same  class  will  be  reviewed  by  the  same  standards;  the 
second  task  requires  the  establishment  of  accepted  review 
standards  for  the  purpose  of  evaluation  of  the  MTA's 
performance  when  it  is  inflicted  with  the  translated 
procedures  generated  by  the  first  task;  and  finally,  the 
third  task  is  the  establishment  of  a referee  with  the 
wisdom  to  evaluate  and  bless  tasks  one  and  two. 

In  order,  to  establish  the  analytical  confidence  in  the 
MTA,  the  investigation  plan  we  are  proposing,  incorporates 
the  three  tasks  described  above.  The  most  important  task 
is  that  of  the  referee  because  it  is  necessary  to  verify 
the  other  aspects  of  the  investigation.  This  is  too  large 
a task  for  a single  individual,  and  a review  by  committee 
would  tend  to  blur  the  critique.  Therefore,  we  have 
selected  an  automated  production  facility  as  the  vehicle 
for  verifying  the  proposed  methodology.  The  use  of  a 
production  facility  will  permit  the  investigation  of  the 
analytical  insight  and  results  generated  by  the  MTA  on  a 
full  scale  hardware  system.  Our  investigation  plan 
requires  the  selection  of  a production  facility  which  is  in 
the  process  of  being  debugged.  After  the  facility  has  been 
selected,  a Software  model  of  the  facility  will  be 
generated  from  the  facilities  documentation  package  and 
debug, god  by  standard  software  methodology. 
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The  next  step  in  the  investigation  is  the  review  and 
analysis  of  the  debugged  software  model  with  the  MTA.  The 
critical  part  of  the  investigation  is  then  ready  to  be 
performed:  The  software  model  and  its  MTA  analysis  will  be 
used  to  describe  and  characterize  the  chosen  facility. 
This  assessment  will  be  used  to  Identify  which  statistics 
of  the  facility  are  descriptive  and  their  respective 
magnitudes.  The  investigation  then  proceeds  to  the  floor 
of  the  production  facility  to  see  if  in  fact  the  computer 
simulation  (software  system  under  test)  and  its  respective 
MTA  analysis  have  any  value.  It  should  be  obvious,  that 
the  design  plan  for  the  production  facility  is  the  mother 
of  the  software  model  and  the  actual  production  facility 
will  be  the  referee  that  evaluates  the  MTA  analysis  of  the 
software  model. 

7.  CONCLUSION 

We  feel  that  existing  SETs  are  insufficient  for  the 
Army's  future  munition  systems.  Therefore,  we  have 
analysized  the  SET  dif ficiencies  and  hypothesized  the  MTA 
to  meet  the  Army's  needs.  Experimentation  with  the  MTA  is 
practicable,  affordable,  and  analytical.  The  MTA  can 
provide  a unique  visualization  of  the  software  which  is 
open  to  widespread  scrutiny.  The  MTA  can  display  potential 
system  weaknesses  and  strengths  which  are  typically  of  a 
synergistic  nature. 

It  should  be  kept  in  mind  that:  1)  the  MTA  is  a 
proposed  solution  to  a perceived  need  and  has  not  been 
investigated  rigorously  and  2)  the  authors  did  not  intend 
to  offend  any  of  the  readers  by  what  may  be  deemed  as  a 
nonrigorous  useage  of  some  of  the  terminology  contained 
herein.  In  addition,  the  authors  would  welcome 
communication  regarding  the  proposed  MTA  as  a new  SET. 
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A COMPARISON  OF  NASTRAN  CODE  AND  EXACT  SOLUTION 
TO  AN  ELASTIC- PLASTIC  DEFORMATION  PROBLEM 
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U.  S.  Army  Armament  Research  and  Development  Command 
Benet  Weapons  Laboratory,  LCWSL 
Watervliet  Arsenal,  Watervliet,  N.  Y.  12189 


ABSTRACT . The  plate  elements  of  the  NASTRAN  code  are  used  to  solve  a 
radially-stressed  elastic-plastic  annular  plate  problem  for  which  an  exact 
solution  was  reported  recently.  A comparison  of  two  approaches  together 
with  an  assessment  of  the  NASTRAN  code  will  be  given. 

1.  INTRODUCTION.  NASTRAN  developed  under  NASA  sponsorship  is  a large, 
general-purpose  computer  program  for  structural  analysis  using  a finite- 
element  displacement  method  approach  [1].  The  computer  program  has  been 
operational  on  IBM  360  Model  44  at  this  Arsenal  since  May  1972.  The  piece- 
wise  linear  analysis  option  of  this  code  can  be  used  to  analyze  general 
plane  stress  problems  in  the  plastic  range.  The  reliability  of  this  code 
has  been  well  demonstrated  in  the  linear  range  but  not  in  the  nonlinear 
range  of  loadings.  One  major  reason  is  because  exact  analytical  solutions 
for  elastic-plastic  problems  are  usually  not  available  for  comparison  with 
approximate  NASTRAN  solutions. 

In  this  paper,  the  plate  elements  of  the  NASTRAN  code  are  used  to  solve 
an  elastic-plastic  deformation  problem  for  which  exact  solutions  were  reported 
recently.  The  problem  considered  is  an  elastic-plastic  annular  plate  radially 
stressed  by  uniform  internal  pressure.  For  ideally  plastic  materials,  the 
stress  solution  for  this  statically  determinate  problem  was  first  obtained 
by  Mises  [2]  and  the  corresponding  two  strain  solutions  were  obtained  by  the 
present  author  on  the  basis  of  both  J2  deformation  and  flow  theories  [3]. 

For  elastic-plastic  strain-hardening  materials,  an  exact  solution  was  recently 
reported  in  [4]  for  the  partially  plastic  deformation  and  in  [5]  for  the  fully 
plastic  deformation.  Analytical  expressions  were  derived  on  the  basis  of  the 
J2  deformation  theory,  the  Hill's  yield  criterion  and  a modified  Ramberg- 
Osgood  law.  The  validity  of  the  above  solution  has  been  established  by  sat- 
isfying the  Budiansky's  criterion. 

In  the  following,  the  theory  of  elastic-plastic  plate  elements  as  used 
in  NASTRAN  is  briefly  reviewed.  In  its  present  form,  NASTRAN  can  not  be 
used  for  problems  involving  ideally  plastic  materials.  It  is  shown  that 
this  limitation  can  be  easily  removed  by  making  minor  changes.  For  an 
elastic-plastic  strain-hardening  material,  the  NASTRAN  solution  is  reported 
here  and  compared  with  the  exact  solution.  The  results  are  presented 
graphically  and  an  assessment  of  the  NASTRAN  code  is  made. 


Peter  Henrici, 


Discrete  variable  methods  in  ordinary  differential  equations. 


John  Wiley  and  Sons,  1962. 
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2.  PLATE  ELEMENTS.  The  theoretical  basis  of  two  dimensional  plastic 
deformation  as  used  in  NASTRAN  is  that  developed  by  Swedlow  [6] . In  the 
development,  a unique  relationship  between  the  octahedral  stress,  t0,  and 
the  plastic  octahedral  strain,  eP,  is  assumed  to  exist  and  the  use  of 
ideally  plastic  materials  is  excluded.  The  total  strain  components  (ex,£y, 
e»,  and  Yxy)  are  composed  of  the  elastic,  recoverable  deformations  and  the 
plastic  portions  (ExP,£yP,EzP,  and  YXyP) • The  rates  of  plastic  flow, 

(exP,  etc.),  are  independent  of  a time  scale  and  are  simply  used  for  con- 
venience instead  of  incremental  values.  The  definitions  of  the  octahedral 
stress  and  the  octahedral  plastic  strain  rate  for  isotropic  materials  are: 


T°  = ^ll2  * 2s122  * s222  + s332^3  , CD 

*oP  * /[(e11P)2  ♦ 2(£12P)2  ♦ Ce22p)2  + ^33P)2]/3  , (2) 

where 


11  ■ J (2CTx  * ay) 

- ellP  “ £xP  - 

22  * J (2oy  - ax^ 

• e22P  " eyP  , 

33  - - I (ax  * ay)  , 

' e33P  ’ ezP  . 

12  ’ Txy 

' £12P  * \ YXyP  . 

(3) 

The  s^j  is  called  the  deviator  of  the  stress  tensor;  ax,  ay  and  Txy  are  the 
cartesian  stresses.  The  isotropic  material  is  assumed  to  obey  the  Mises 
yield  criterion  and  the  Prandtl-Reuss  flow  rule.  The  matrix  relationship 
for  the  plastic  flow  is 


1 
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The  plastic  modulus,  can  be  related  to  the  slope,  Ey,  of  the  effective 

stress-strain  curve  by 


1 _1_  1_ 
3Wy(T0)  = Et  ' E 


(6) 


The  total  strain  increments,  obtained  by  adding  the  plastic  and  linear 
elastic  parts,  are: 


{Ae}  = ([DP]  + [G] - 1 ) { Acj } = [Gp]  _1{Aa}  , (7) 

where  [G]  is  the  normal  elastic  material  matrix  and  [Gp]  is  the  equivalent 
plastic  material  matrix.  The  matrices  [ DP ] and  [Gp]'1^ exist  for  finite 
values  of  My  (or  Ey)  and  [Gp]  can  be  obtained  numerically.  Because  this 
procedure  is  chosen  in  developing  NASTRAN  program,  only  strain  hardening 
materials  can  be  considered  for  applications.  However,  it  should  be  noted 
that  even  the  matrix  [Gp]*1  does  not  exist  when  My  or  Ey  is  equal  to  zero, 
the  matrix  [Gp]  may  still  exist.  In  fact,  the  closed  form  of  [Gp]  has  been 
given  in  [7] . We  can  express  as 

P s222+2A  SYM. 


S11  s2'>+2vA 


’ll 


+2A 


sll+vs22 
I+v  s12 
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2(l+v) 


(l-v)Er  , 

T 2 

E-Ey  o 


(8] 


where 


B = Sjj2  + 2VSyy  S22  + S222 


Q = 2(1-v2)A  + B . 
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If  we  want  to  remove  the  limitation  that  the  use  of  ideally  plastic  materials 
is  excluded,  we  have  to  make  minor  changes  in  subroutines  PSTRM  and  PKTRM  of 
the  NASTRAN  program. 

3.  PROBLEM  DEFINITION.  The  finite  element  model,  shown  in  Figure  1, 
utilizes  the  condition  of  axial  symmetry  so  only  a sector  of  the  annular  plate 
is  modeled.  All  membrane  elements  use  stress  dependent  materials.  The 
effective  stress-strain  curve  is  defined  by 


o * Ee  for  o <_  oQ  , 

(o/Oq)"  " (E/o0)e  for  a > oQ  , (10) 
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and  a ^lack  of  understanding  of  the  scoring  criteria 
applicable  to  the  software  system. 
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where 


(3//2)i 


e = 0/E  + /2 


(11) 


n is  the  strain  hardening  parameter  and  the  initial  yield  surface  is  defined 
by  the  ellipse  a = a0.  The  input  parameters  for  the  problem  are 


a = 1.0  inch  (inner  radius) 
0O  = 90/17  degree  (sector) 

E = 10.5  x 106  psi 
aQ  = 5.5  x 101*  psi 


b = 2.0  inch  (outer  radius) 
t = 0.1  inch  (thickness) 
v = 0.3  (Poisson's  ratio) 
n = 9 


All  grid  points  are  constrained  in  the  tangential  direction.  The  applied 
load  is  the  internal  pressure  p.  If  p = 1000  psi,  the  equivalent  nodal 
force  at  each  of  the  two  interior  grids  is  Q = apt  tan(0/2)  = 4.62329 
pound  in  the  radial  direction.  The  true  pressure  corresponding  to  initial 
yielding  for  this  problem  is  p0  = 23,571.35  psi.  Four  sets  of  load  factors 
are  used.  The  load  increments  for  three  of  them  are  uniform  with  Ap/p0  = 
0.20,  0.10,  0.05,  respectively.  The  load  factors  for  the  fourth  set  are 
obtained  from  the  analytical  solution  and  each  of  the  first  nine  load 
factors  is  supposed  to  cause  one  more  element  to  become  plastic.  A complete 
input  deck  acceptable  by  the  NASTRAN  program  is  shown  in  Table  1.  This 
input  deck  defines  a NASTRAN  problem. 


4,  RESULTS  AND  DISCUSSIONS.  The  outputs  of  the  NASTRAN  program  are 
the  displacements  and  stresses  for  each  load  factor.  The  information  for 
the  strains  is  not  available.  The  stress  results  are  ax,ay  and  Txy  at  the 
centroid  of  each  element.  The  stress  components  in  polar  coordinates  can 
be  calculated  by 


o0  = i (ax+0y)  - ~ (ox-ay) cos29  - Txy  sin20  , 

aro  “ * J (Cx-ay)sin20  + Txycos20  (12) 

The  NASTRAN  results  will  depend  on  the  user's  choices  of  element  sizes  and 
load  increments.  In  general,  a user  can  get  better  results  at  a higher  cost 
by  using  more  elements  and  smaller  load  increments.  But  there  are  still 
other  sources  of  built-in  errors  in  the  program. 

In  the  elastic  range,  the  NASTRAN  results  based  on  two  finite-element 
models  were  compared  with  the  exact  solution.  For  a ten-element  model, 
the  maximum  error  is  0.36%  in  displacements  and  0.95%  in  stresses.  For 
a twenty-element  model,  the  maximum  errors  are  reduced  to  0.09%  and  0.24%  , 
respectively.  Since  we  are  satisfied  with  1%  error,  the  ten-element  model 
as  shown  in  Figure  1 is  chosen. 
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In  the  plastic  range,  the  user  has  to  choose  the  load  increments 
properly  in  order  to  obtain  good  results  at  reasonable  cost.  The  values 
of  the  load  factors  can  be  normalized  so  its  unit  value  corresponds  to  the 
limit  of  elastic  solution,  i.e.,  pQ  = 23,571  psi.  The  load  increments  can 
be  uniform  or  nonuniform.  It  seems  that  the  size  of  load  increments  depend 
on  the  material  properties  and  sizes  of  elements.  In  order  to  determine 
the  influence  of  load  factors  on  the  displacements  and  stresses  in  the 
plastic  range,  four  sets  of  load  factors  are  chosen.  The  load  increments 
for  three  of  them  are  uniform  with  Ap/p  = 0.20,  0.10,  0.05,  respectively. 

The  influence  of  load  factor,  p/p0,  on  the  inside  radial  displacement,  Uj, 
is  shown  in  Figure  2.  The  effect  of  load  factors  on  the  major  principal 
stresses  in  element  1 is  shown  in  Figure  3.  We  also  show  in  these  two 
figures  the  corresponding  analytical  results.  On  the  basis  of  these 
comparisons,  we  can  make  the  following  conclusions.  In  the  earlier  stages 
of  plastic  deformation,  larger  load  increments  can  be  used  to  give  very  good 
results.  As  plastic  deformation  becomes  bigger,  smaller  load  increments 
should  be  used  in  order  to  get  reasonably  good  answer.  However,  for  very 
large  plastic  deformation,  it  seems  that  we  cannot  improve  the  NASTRAN 
results  much  better  by  chosing  even  smaller  increments.  This  is  because  there 
are  other  built-in  errors  in  the  NASTRAN  program,  e.g.,  the  linear  displace- 
ment function  is  assumed. 

If  uniform  load  increments  are  used,  a direct  comparison  of  the  analyt- 
ical and  NASTRAN  results  is  not  available.  The  solid  curves  shown  in 
Figures  2 and  3 were  obtained  indirectly  since  the  analytical  results  of 
the  displacements,  stresses  and  pressure  were  represented  as  functions  of 
elastic-plastic  boundary  [4,5].  We  have  obtained  the  analytical  results 
when  the  elastic-plastic  boundary  is  located  at  a radial  distance  of  0.05, 
0.15,  0.25,  0.35,  0.45,  0.55,  0.65,  0.75,  0.85,  0.95  inch  from  the  inside 
surface.  The  corresponding  values  of  the  pressure  factor  (p/pQ)  are 
1.094776,  1.265848,  1.413734,  1.540074,  1.646427,  1.734275,  1.805022, 

1.860089,  1.900829,  1.928619,  respectively.  This  set  of  values  is  used 
as  load  factors  in  the  input  deck  of  the  NASTRAN  program.  Some  of  the 
NASTRAN  results  together  with  the  corresponding  analytical  results  are 
shown  in  Figures  4 to  7.  A direct  comparison  of  the  two  approaches  in  the 
plastic  range  can  be  seen.  The  effect  of  load  factors  on  the  distribution 
of  radial  displacements  is  shown  in  Figure  4.  The  stress  results  of  the 
NASTRAN  program  are  ax,  ay,  Txy,  two  principal  stresses,  principal  stress 
angle  and  maximum  shear  at  the  centroids  of  all  elements.  The  distributions 
of  major  and  minor  principal  stresses  for  three  load  factors  are  shown  in 
Figures  5 and  6,  respectively.  The  effect  of  the  load  factors  on  the 
errors  in  principal  stress  angles  is  shown  in  Figure  7.  If  there  is  no 
error  in  the  principal  stress  angle,  the  major  and  minor  principal  stresses 
will  be  the  tangential  and  radial  stresses,  respectively.  As  can  be  seen 
in  Figures  4 to  7,  a direct  comparison  of  two  approaches  will  support  the 
following  conclusion:  Even  if  the  results  in  the  elastic  range  are  in 
excellent  agreement,  the  differences  in  the  plastic  range  can  be  quite  big 
for  large  values  of  load  factors.  This  suggests  more  research  efforts  should 
be  given  to  large  plastic  deformation.  It  also  remains  a question  how  much 
improvement  over  the  present  model  in  the  plastic  range  by  using  more 
elements.  An  alternative  approach  is  to  use  higher  order  isoparametric 
elements . 
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In  this  paper,  emphasis  has  been  given  on  the  limitations  of  the  NASTRAN 
program  so  we  can  make  some  suggestions  for  improvements  and  additions  of 
new  capability.  However,  this  code  in  its  present  form  is  still  a valuable 
tool  because  it  can  be  used  to  solve  quite  complicated  plane  stress  problems 
provided  the  plastic  deformation  involved  is  not  too  large.  Furthermore,  the 
NASTRAN  program  is  widely  used  and  well  documented  so  we  can  expect  more 
improvements  and  additions  in  the  future  either  by  us  or  others. 
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analytical  approach  is  the  only  viable  method  of  evaluating 
the  inherent  workability  (correctness)  of  a system  We 
also  believe  that  mathematical  proofs  of  correctness  are 
not  well  enough  developed  to  have  universally  accepted 
standards.  Therefore,  this  new  analytical  aDDroaob 
comprised  of  a standard  hardware  anaiog ! is ' proposed 
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TABLE  1.  LIST  OF  INPUT  PFCK  TOR  A RADIALLY  STRESSED  ANNULAR  PLATE 


ID  ANNULAR  PLATE  UNDER  UNIFORM  INTERNAL  PRESSURE 

APP  DISPLACEMENT 

SOL  6,0 

I IME  3C 

CFND 


TITLE  * 

ELASI 

IC-PLASTIC  ANNULAR  PLATE 

SUBT  • 

SECTOR  OF  90/17  DEGREE 

Wl TH  GECME 1 R1C 

RAT  10  2. 

LABEL  - 

RAH8ERG -OSGOOD 

LAW 

LOAD  9200 

S PC#  3C0 

HP  C 9 

3 SO 

PL  COE.  F F 

9 <.00 

STRESS 

• all 

DISP  9 

ALL 

SPCF  *AiL 

BEG INBULK 

CORD2C 

1000 

o.ooo 

0.0  0.0 

0.000 

cc?c 

1 .000 

0.0 

1.000 

COOMEM 

1 

11 

1 3 

4 

2 

CODM.EM 

2 

11 

3 5 

6 

4 

COOMEM 

3 

1 1 

5 7 

8 

6 

COOMEM 

A 

11 

7 9 

10 

8 

COOMEM 

5 

1 1 

9 11 

12 

10 

COOMEM 

6 

11 

11  13 

14 

12 

CODM.EM 

7 

11 

13  15 

16 

14 

COOMEM 

8 

11 

15  17 

18 

16 

CODM.EM 

9 

11 

17  19 

20 

18 

CODM.EM 

10 

11 

19  21 

22 

20 

FORCE 

201 

1 1000 

A. 62329  1. 

FORCE 

202 

21000 

A. 62329  1. 

GRDSET 

GRID 

1 

looo 

1.0 

0. 

GRIO 

2 

1C00 

1.0 

5.29A118 

GRID 

3 

1000 

1.1 

0. 

GRIO 

4 

1000 

1 . 1 

5.29A118 

GRIO 

5 

1000 

1.2 

0. 

GRID 

6 

1000 

1.2 

5.29AI 18 

GRID 

7 

1000 

1.3 

0. 

GRIO 

8 

1000 

1.3 

5.29A1 18 

GRID 

9 

1000 

1 . A 

0. 

GRIO 

10 

1000 

1 . A 

5.29A1 18 

GRID 

11 

1C00 

1.5 

0. 

GRID 

12 

1000 

1.5 

5.29A118 

GRIO 

13 

1000 

1.6 

0. 

GRID 

14 

1000 

1.6 

5.29A118 

1.000  CC2C 
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TABLE  1. 

LIST 

OF  INPUT 

DECK  FOR 

A RADIALLY  STRESSED  ANNULAR 

PLATE 

(cont inued) 

CR  10 

15 

1C00 

1.7 

0. 

OR  10 

16 

1000 

l . 7 

5.298) 1 8 

OR  1 0 

17 

1000 

1 . 8 

0. 

OR  10 

18 

1000 

1.8 

5.2981 18 

CR  10 

19 

1000 

1.9 

0. 

GRID 

20 

1000 

1.9 

5.298118 

GRID 

21 

1 000 

2.0 

0. 

GR  1 0 

22 

1000 

2.0 

5.2981  18 

LOAD 

200 

23. 571 35  1 . 

701 

1. 

202 

MAI  1 

5 1.05L7 

. 3 

MATS1 

5 

500 

MPC 

351 

2 

2.995738 

2 

1 - 

.097768 

MPC 

352 

8 

2.995738 

* 

1 - 

.097768 

KPC 

353 

6 

2.995  738 

6 

1 - 

.092268 

MPC 

358 

8 

2.995738 

6 

1 - 

.097268 

MPC 

355 

10 

7. 995  738 

10 

1 > 

.097268 

MPC 

356 

1 2 

2.995  738 

12 

1 “ 

. 09  2 26  8 

MPC 

357 

18 

2. 995  738 

18 

1 « 

.097768 

MPC 

358 

16 

7.995738 

16 

1 - 

.097268 

MPC 

359 

1 8 

7.995  738 

1 8 

1 " 

.097268 

MPC 

360 

20 

7.995  738 

20 

1 " 

.097768 

MPC 

361 

22 

2.995738 

77 

1 - 

.092268 

MPC ADD 

350 

351 

352  353 

358 

355 

356 

LMPC  1 

358 

359 

360 

361 

PLf  ACT 

5001. 09877 6 1.2 6 58581. 8 137381. 

5500781 .6868271 

.7382751 

LPL  1 

1 . 86CC891 .9CC8791 .95 

2.  7 

.05  7.1 

POOMEM 

1 1 1 

5 . 1 

5PC  1 

300  2 

1 3 

5 

7 

9 

LPC  1 

1 3 

15 

17 

19  21 

• 

IABLES1 
LIAB  0 
LIAB  1 
L TAB 
C T AB 
LIAB 
LIAB 
LIAB 
LIAB 
LIAB  6 
f.TAB  9 
CIAB10 
CTAB1 1 
E NDDa I A 


500 

.0  0 

.0066/33  565  00 
• 009 12  66  58600 


0.0052381  55000.0.0056826  55500.0.0061603 
0.007225]  5 1000. 0.00/8188  5 /SOO . 0 . 0085 58  1 
0.0098532  59000.0.0106306  59500 . 0 . 0 1 1 5 6 2 3 
.0123511  60500. 0.01 33008  61000.0.0153157  61500.0.0153969 
,0165511  62500.0.0177817  6 3000 . 0 . 0 1 909 26  6 3 500 . 0 . 0 205 89 3 
0.0235575  65C00 . 0 . 0 25 2 39 3 65500.0.0270278 
0.0309883  67000.0.0330857  67500.0.0353581 
.0377678  68500.0.0803227  69000.0.0530298  69500.0.0558976 
.0885338  70500.0.0521878  71000.0.0555871  71500.0.0591826 

72500.0.0669597  73000.0.0717020  73500.0.0756821 

.0858002  75000.  .0967172  76000.  .1087281 


.0219758  68500 
.0789269  665C0 


. 0629832 
.0808105  78500. 
ENOT 


357LMPC1 
.805077CPI  1 


11LPC1 

LIAB  0 
56000. OLIAB  1 
58000. OLlAb  2 
60000. 0LIA8  3 
67000. OLIAB  8 
68000 .061 AB  5 
66000.  OL  I AB  6 
68C00. OLIAB  7 
70000. OLIAB  8 
77000. OLIAB  9 
75000. OLI AB  10 
77000.  LIAB11 
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RADIAL  DISPLACE  MENT  , JO  INCH 


FIGURE  4.  DISTRIBlfTION  OF  RADIAL  DISPLACEMENTS 
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fioiire  5.  niSTRiBirrios  of  major  principal  stresses 
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FICURE  7.  ERRORS  IN  PRINCIPAL  STRESS  ANCLES 
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ABSTRACT.  In  connection  with  efforts  to  utilize  the  CRAY-1  computer 
efficiently,  we  present  some  methods  of  analysis  of  rates  of  convergence  for 
block  iterative  methods  applied  to  the  model  problem.  One  of  the  more  interest- 
ing methods  involves  relaxing  an  p x p blocks  of  points.  A Cholesky  decomposi- 
tion is  used  for  that  smaller  problem.  One  of  the  basic  methods  of  analysis 
is  a modification  of  a method  discussed  earlier  by  Parter.  This  analysis 
easily  extends  to  more  general  second  order  elliptic  problems. 

1 . INTRODUCTION . Some  15-20  years  ago  there  was  a great  interest  in  itera- 
tive methods  for  elliptic  difference  equations  - see  [13],  [14],  [15], 

[7],  [9],  [10].  More  recently  there  has  been  a greater  emphasis  on  direct 
methods  for  these  sparse  matrices  - see  [5],  [6],  [11],  [12], 

However,  with  the  advent  of  "vector  machines"  and  "parallel  processors"  we 
have  found  it  necessary  to  return  to  a consideration  of  certain  iterative  methods. 

The  CRAY-1  computer  can  perform  up  to  250  million  floating  point  operations 
per  second  [2].  Algorithms  that  execute  with  high  arithmetic  efficiency  on  this 
computer  must  "fit  the  architecture"  of  it  and  be  carefully  programmed  in  assembly 
language.  Thus  in  using  this  computer,  we  seek  computational  modules  that  can 
be  implemented  efficiently  and  that  can  be  used  in  solving  diverse  problems. 

The  solution  of  banded  positive  definite  linear  systems  is  such  a module,  and 
the  Cholesky  decomposition  algorithm  for  it  can  be  implemented  on  the  CRAY-1 
such  that  its  execution  proceeds  at  the  rate  of  about  100  million  floating  point 
operations  per  second.  Since  the  vector  registers  of  the  CRAY-1  can  hold  at  most 
64  numbers,  implementation  of  banded  Cholesky  is  simplified  if  vector  lengths  do 
not  exceed  64.  Block  Relaxation  techniques  for  solving  elliptic  difference 
approximations  require  the  solution  of  banded  positive  definite  linear  systems. 
These  facts  led  us  to  investigate  the  convergence  rate  of  block  successive  over- 
relaxation for  the  model  problem  using  p x p blocks  (preferably  p <_  64) . 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-75-C-0024s  by  the 
Los  Alamos  Scientific  Laboratory  under  Contract  No.  W-7405-ENG-36;  and  by  the 
Office  of  Naval  Research  under  Contract  No.  N00014-76-C-0341 . 
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In  [ 9 ] one  of  us  developed  a fairly  general  theory  for  obtaining  such 
estimates  on  the  rates  of  convergence  of  iterative  methods  for  elliptic  differ- 
ence equations.  However,  partly  because  of  the  generality  of  that  work  (vari- 
able coefficients,  general  domains,  etc.)  it  is  by  no  means  a transparent 
discussion.  On  the  other  hand,  in  the  case  of  the  model  problem  it  is  relatively 
easy  to  develop  this  general  approach.  This  is  partly  due  to  the  strong  estimates 
of  ( 1 ] and  [ 8 ) . 

In  section  2 we  describe  the  model  problem  and  iterative  methods  for  its 
solution.  In  section  3 we  develop  the  general  theory  (for  this  special  case)  . 

In  section  4 we  obtain  the  rates  of  convergence  estimates  for  the  p x p block 
method  mentioned  earlier.  In  section  5 we  apply  the  theory  to  the  multi-line 
methods  (these  methods  have  been  studied  earlier  [9],  [10]). 

Finally,  because  it  is  worthwhile  for  the  practical  worker  to  have  avail- 
able many  methods  for  getting  information  on  rates  of  convergence  (some  work 
here  - others  there),  in  section  6 we  return  to  multi-line  methods.  Using  a 
completely  different  technique  we  obtain  upper  bounds  (unfortunately:  not 
sharp  bounds)  on  the  rates  of  convergence. 

2.  THE  MODEL  PROBLEM.  Let 

2.1)  f)  5 {(x,y);  0 < x ,y  < 1}  . 


Let  P be  a fixed  integer  and  set 


1 

P + 1 


Consider  the  set  of  interior  mesh  points 


(1(h)  * {(xk,y.)  = (kh,jh)},  l<k,j<P 


as  well  as  the  boundary  mesh  points 

2.3)  3fi(h)  = {(x  ,y.);  (k  = 0 or  P + 1)  or  (j  = 0 or  P + 1)  > . 

K 3 

Let  U = ^ukj^  be  a vector  defined  on  the  set  of  all  grid  points:  0(h)  U 30(h) 

that  is,  u . is  the  value  of  U at  (x.,y.).  We  call  U a "grid  vector". 
kj  k 3 

In  differing  circumstances  we  will  choose  different  orderings  of  the  components 
of  U. 

As  usual,  we  define  the  discrete  Laplace  operator  by 


Vi.j  - ^kj  * Vi.j  , Uk,i+1  ~ 2ukj  + Uk,i-1 


2.4a)  <AhU)kj 


- 2u.  . + u. 


1 < k, j < P . 


Note:  While  U is  defined  on  the  entire  mesh- region,  A^U  is  defined  only  on 
fl(h),  the  interior.  Also,  we  define  the  difference  operators 


w 


2.4b) 


2.4c) 


[V  U).  . 

x k,3 


u,  . - u,  , . 

k.J  k-l,j 


uk  i " Uk  -i-i 
[V  W.  ■ = 2 ■ r '3  - 

y k,j  h 


1 i j < P,  1 < k < P + 1 , 


1 i j < P + 1,  1 < k < P . 


The  basic  problem  is:  Given  grid  vectors  F and  G,  find  a grid  vector 
U such  that 


2.5a) 

2.5b) 


AhU  *=  F,  in  0(h)  , 
U = G,  on  30(h) 


After  an  ordering  of  the  points  (x  ,y.)  is  determined  we  let  A be  the 

2 K J 

matrix  representation  of  -h  A^;  symbolically,  we  write 


2.6) 


A ~ -h  A 


As  we  have  already  remarked  A maps  vectors  with  P + 4P  components 

2 h 2 
into  vectors  with  P components.  The  matrix  A actually  is  a square  P by 

2 

P matrix.  The  known  boundary  values,  G,  are  put  on  the  right-hand-side. 

In  this  way  the  difference  equations  (2.5a),  (2.5b)  take  the  form 


2.7) 

where  the 


AV  ■=  F 


over  F is  meant  to  indicate  both  the  result  of  ordering  the 
2 

components  of  -h  F and  the  necessary  modifications  of  F required  by  the  G 

2 

terms.  In  any  case,  every  vector  V with  P components  may  be  thought  of  as 
a grid  vector  which  also  satisfies 

2.8)  V =■  0 on  80(h)  . 

An  iterative  method  for  the  solution  of  (2.7)  is  determined  by  a "splitting" 

2.9a)  A = M - N . 

Equation  (2.7)  is  then 

2.9b)  MV  ■=  NV  + F . 

After  choosing  a first  guess  V^,  one  obtains  V^.V^, from 


2.10) 


MV**1 


*=  NV  + F . 


Let 


2.11) 


p - max{|l|i  det(XM  «•  N)  » 0}  . 


It  is  well  known  that  the  iterates  V converge  to  the  unique  solution  V of 
(2.7)  if  and  only  if  (independently  of  V°) 

2.12)  p < 1 . 
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The  problem  studied  in  this  report  is:  find  the  asymptotic  behaviour  of 
p as  h -*  0. 

Remark:  Of  course,  for  every  X which  is  a generalized  eigenvalue,  (i.e. 
det(XM  - N)  =0)  there  is  a vector  U ^ 0 such  that 

2.13)  XMU  = NU  . 

3 . A GENERAL  APPROACH . We  make  some  assumptions  about  the  splitting  (2.9a) 

* 

A.l)  M = M and  is  positive  definite 


p = max 


(Nx ,x  > 
<Mx , x > 


where 

<x,y>  - xT;  - E xkj;k.  . 

* * * 

Note:  Since  A = A , M = M then  N = N ; and,  as  is  well-known  ( 4 ) the 
generalized  eigenvalues  are  all  real  and 


p *=  max 


<Mx,x> 


Thus,  the  force  of  the  assumption  (A. 2)  is  that  max [ X [ occurs  for  a positive 
eigenvalue  X = p . 

A. 3)  There  is  a positive  constant  Nq,  independent  of  h,  such  that 

IML  - No  • 


IMI.  " sup{  I (NU)  kj  I ; |ukj|  < 1}  . 

Finally  we  come  to  the  main  new  concept. 

A. 4)  There  are  positive  constants  q,  K,  independent  of  h,  such  that:  if  U 
is  a grid  vector  satisfying 

(i)  U - 0 on  8(1  (h) 


|VXU|  + |VyU|  < B 


for  some  constant  B,  then 


<NU,U>  - q<U,U>  + E 
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where 


3.1a)  |e|  ^ KB/h  . 

Remark : As  one  might  imagine,  the  determination  of  q and  the  verification  of 
(A. 4)  is  the  important  technical  aspect  of  this  analysis  when  applied  to  any 
particular  case.  However,  as  we  shall  see  in  sections  4 and  5,  it  is  not  too 
difficult. 

Lemma  3.1:  Suppose  the  splitting  (2.9a)  satisfies  (A.l)  and  (A. 2) . Then  the 
method  is  convergent.  That  is; 

3.2)  p < 1 . 

Proof : Let  U be  the  eigenvector  associated  with  p.  Then  ( NU,U>  0. 

Since  M = A + N and  A is  positive  definite,  we  have 

„ <NU,U>  < NO , U > 

0 < p = 7~  - r-  = ; ; — - — ; r-  < 1 . 

- < MU,U>  <AU,U>  + <NU,U> 

The  basic  result  of  this  section  is 

Theorem  3.1:  Suppose  the  splitting  (2.9a)  satisfies  the  conditions  (A.l), 

(A. 2),  (A. 3)  and  (A. 4) . Then 

, 2 

3.3)  p = 1 - - — h + 0(h  ) . 

q 


Proof : Let  U be  the  grid  vector 


3.4) 


^3 


(sin  kuh)  (sin  jirh)  . 


Then  U satisfies  conditions  (i),  (ii)  of  (A. 4)  . In  particular,  because  of 
(i)  we  may  speak  of  <NU,U>  and  <U,U>.  The  constant  B of  (ii)  is  2tt.  The 
following  facts  are  well  known  (see  [13]  particularly  page  202) . 


3.5) 


*2<->  - ! (pfr 


3.6) 


h <AU , U > _ 4q  _ COS7Th)  =,  2it2h2[l  - ~ (Jth)  2 + 0(h4)]  . 

h2(U,U>  12 


For  all  V which  are  zero  on  30(h)  and  V **=  0, 


3.7) 


-h^<A^V,V) 

h2<V,V> 


h <AV,V> 


h <V,V> 


> 2it2[1  - Yf  (»h)2  + 0(h4)]  . 
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Since  M = A + N 


( NU,U> 

P * / 1 1 T I I.  V 


h <NU,U> 


Applying  (A. 4)  we  have 


And,  using  (3.5)  we  have 


— < MU  , U > .2  . . ,2. 

h <AU,U>  + h <NU,U> 


h2<NU,U>  = q[h2<U,U>]  + h2E 


h2(NU,U>  = [q  + 0(h) ] [h2(U,U>] 


Thus 


P > 


1 + 


h (AU , U > 


(q  + 0(h)  ] [h  (U , U > ] 


Using  (3.6)  we  obtain 


3.8)  p > 1 - — ■ + 0(h3)  . 

~ q 

In  order  to  obtain  the  reverse  inequality  we  require  some  basic  estimates 


of  ( 1 ] and  [ 8 ] . 

These  are 

Lemma  3.2:  Let  V 

be  a grid 

vector 

which  is  zero  on 

30(h)  . 

Then 

______  j 

1/2 

3.9) 

|vkjl 

+ it  {h  (A.V,A.V>} 
h h 

Proof:  See  lemma  8 

, page  304 

of  ( 8 ] 

• 

Lemma  3.3:  Let  V 

be  a grid  vector 

which  is  zero  on 

30(h)  . 

Then 

3.10a) 

ivi 

< max | A^V | , 

3.10b) 

lyi 

<_  max  | A^V  | . 

Proof:  This  result 

is  contained  in  Theorem  5,  page  488  of  [ 1 ] 

# 

For  convenience  of  notation,  for  every  grid  vector  V,  restricted  to  the 
interior  0(h),  we  write 

3.11)  ||v||g  - {h2(v,v>}1/2  . 
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Returning  to  the  proof  of  the  theorem,  let  U be  the  eigenvector 
associated  with  p and  normalized  so  that 


3.12) 

Then 


Wig  “ 1 


pMU  *»  NU 

pAU  = p(M  - N)U  = (1  - p)NU  . 

That  is 

3.13a)  -AjU  « pNU 

where 

3.13b)  p ■ (1  - p)/ph2  . 

From  lemma  3.1  and  (3.8)  we  see  that 
3.14a)  0 < p 

and 

2 


3.14b) 


. . 2ir 

lim  sup  p < 

h-K>  q 


Moreover,  the  theorem  will  be  proven  if  we  show  that 

2 


He  write  (3.13a)  as 


where,  if  h is  small  enbugh 


Applying  lemma  3.2  we  see  that 


P - ~ + 0(h)  . 


"V  * * 


< — — - N ■ N.  . 
g - q 0 1 


Kjl  ij-TTT.! 


Thus 


l»l.J  I i V "o'?  V ' ",  • 
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Applying  lemma  3.3  we  have  (ii)  of  (A. 4)  with  B - 2N  . Hence,  ueing  (A. 4) 
we  have  * 


Or,  making  use  of  (3.12) 


h2<NU,U>  - q(h2<U,U>)  ♦ h2E  . 


h2  (Nil ,l) ) - [q  ♦ O(h) ) ( h2  <U , li ) ) . 


From  (3.13a)  we  have 


Hence,  from  (3.7) 


■h2(4U,U)  - Uh2<NU,U> 
n 

- vilq  + 0(h)  ] |h2(u,u>)  . 


2* 2 1 1 ♦ 0(h2)l  < vi  (q  + O (h)  ) . 


Thus,  combining  this  result  with  (3.14b),  the  theorem  is  proven. 

4.  P * p BLOCKS.  Let  p be  a fixed  integer  and  assume  that 

4.1)  P - PC  . 

Of  course,  as  P ■*  • (i.e.  h -*  0)  Q ■*  •»  and  vice-versa. 

The  Interior  grid  vector  is  arranged  into  sub  grid  vectors  (U  ) of  p' 
entrees  as  follows  r“ 

4-2)  °r.  " {u(r-l)pto.(s-l)p*W'  1 - 0(11  - Pl'  1 i r*'  I ° * 

Within  U,  the  are  ordered  as  follows 

4-3)  U - (UU.U21.  ...  UQ1.U12.U22.  ...  Uq2,  ...  01Q.  ...  . 

That  is;  we  start  at  the  bottom  row  of  p * p blocks  and  count  off  from  left  to 
right!  then  to  the  next  (second)  row  of  p * p blocks  - again  from  left  to 
right,  etc.  Within  each  block  (ur>)  the  subgrid  vector  is  ordered  in  the  sane 

manner.  To  be  specific,  let  G(r,s,ii),  vi  - 1,2,  ...,p  be  the  p vector  of  grid 

‘values  associated  with  the  p**1  horitontal  line  within  the  (r,s)  block.  That 

la 
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r 


4.4) 


then 


4.5) 


G(r,s,w) 


rs 


(r-l)p+l, (s-l)p+p 
'(r-Dp+2,  (s-l)p+u 


(r-Dp+a,  (s-l)p+p 


rp,  (s-Dp+u 


G(r,s,l) 


G(r,s,p) 


[G  (r  ,s,p)J 

The  discrete  Poisson  equation  (2.5a),  (2.5b)  takes  the  form 

4.6)  TU  - A U - A.U  - B ,U  , - B,U  , ■ P 

rs  -1  r-l,s  1 r+l,s  -1  r,s-l  1 r,s+l  rs 

2 2 

where  T,  A^,  A , B_^ , B^  are  p * p matrices.  Each  is  block  tridiagonal 
of  "pseudo  order”  p and  each  block  is  a p x p matrix.  Specifically 

T ■ [-1  ,R  ,-I  ] "block  tridiagonal" 

P P P 


Rp  - t-1,4,-1] 


tridiagonal  . 


If  Ea6  is  the  p x p matrix  with  "1"  in  the  (a, 6)  position  and  zero  else- 
where , then 


4.7a) 

4.7b) 

Notice  that 


A_x  ■ diagonal [Elp,Elp,...,Elp]  , 
Aj  ■ diagonal (E^ ,E^^, ... ,E^^1  • 


E1P  ” [o  ] ' Epi  " Eip  ‘ 

The  matrix  B , is  the  "block"  E,  while  B,  is  the  "block”  E , . That  is 
-1  lp  1 pi 
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I ' 
p 

, B - 

o' 

o 

1 

.xp 

We  rewrite  (4.6)  as 

4.9)  MU  - NU  + F 

where  M - diagonal [T,T, ... ,T]  and  N is  made  up  of  A_x* Ax *B  . 'Bj • 

We  see  at  a glance  that  M is  positive  definite  and  (A. 1)  is  satisfied. 
Furthermore,  we  are  dealing  with  a "block"  five  point  star,  that  is,  the  equa- 
tions have  the  same  block  structure  as  the  original  problem.  Therefore,  our 
splitting  has  "block  property  A".  Thus,  we  have  the  basic  result,  if  X is 
an  eigenvalue  of  det{XM  - N}  - 0,  sols  (see  [14],  [15]).  Therefore 

(A. 2)  is  satisfied. 

Now,  N only  includes  the  coefficients  in  which  relate  points  in 

the  (r,s)  block  with  neighboring  points  in  the  four  blocks  (r  + l,s),  (r-l,s), 
(r,s  + 1),  (r,s  -1).  We  see  that  each  row  of  N not  corresponding  to  a 
corner  point  of  the  (r,s)  block  has  at  most  one  "1"  and  all  other  entrees 
are  "0".  The  rows  corresponding  to  corners  lead  to  exactly  two  "l"'g.  Thus 

4.10)  ||N||.  - 2 , 
and  (A. 3)  is  satisfied  with  NQ  - 2 . 

Finally  we  turn  our  attention  to  the  determination  of  q and  the  verifica- 
tion of  (A. 4) . 

Lamina  4.1;  Suppose  U is  a grid  vector  which  satisfies  (i) , (ii)  of  (A. 4) . Then 

4.11a)  <NU,U>  - - <U,U>  + E 

P 

where 


4.11b) 

That  is,  (A. 4)  is  satisfied  with 
4.12a) 

and 

4.12b) 


Proof i We  have 
4.13) 


<NU,U>  - l 


r,a 


UT  (NU) 

rs  rs 
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Consider  a term 


4.14)  UT  (NU)  - UT  A ,U  , + UT  A. U ^ 4 UT  B ,U  4 UT  B U 

rs  rs  rs  -1  r-l,s  rs  1 r+l,s  rs  -1  r,s-l  rs  1 r,s+l 


It  is  easy  to  see  that 


4.15) 


T ? 

UrsA-lUr-l,s  ^ U(r-l)p, (s-l)p+u  u (r-l)p+l, (s-l)p+y 


Fix  o,  1 o £ p.  We  use  (ii)  to  write 

Py  " u(r-l)p, (s-l)p4y  u(r-l)p+l, (s-l)p+u 

^u(r-l)p+o,  (s-l)p+u  + °h®l^u(r-l)p+o,  (s-l)p+vi  + oh®2^ 


where 


Thus 


where 


| | < B,  j - 1,2  . 


% " ^U(r-l)p+o, (s-l)p+p^  + 2elh  + C2h 


ICjl  < (pB) * , j - 1,2  . 

Therefore , we  may  replace  F^  by  the  avera9e  over  o,  1 <,  o <.  P«  Thus 
UrsA-lUr-l,s  “ J[x  Fy  " P ^ fu(r-l)p+o,  (s-Dp+y1  + E1 


where 


|EX|  < 2(pB)2h(l  4 h)  . 


Each  of  the  other  terms  in  the  right-hand-side  of  (4.14)  may  be  treated  in  a 
similar  manner.  We  obtain 

T 4 T 

U (NO)  - - U U 4 E 
rs  rs  p rs  rs  2 


where 


|e2|  < 16(pB)2h  . 
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Finally,  using  (4.13)  we  see  that 


where 


(l),NU>  = - (U,U>  + E 
P 


|e|  < 16(pB)2Q2h  < (16B2)  i 


Corollary:  If  one  considers  the  p x p block  Jacobi  iterative  method  described 

by  (4.6) -(4.9)  then 


P-1-  yph2  + 0(h3)  . 


Proof:  Apply  Theorem  3.1. 

We  close  this  section  with  a consideration  of  the  successive  over-relaxation 
iterative  method  based  on  this  splitting. 

Let  a parameter  u be  chosen.  Then  the  successive  over  relaxation  method 
based  on  (4.6)  is  given  by 

- TUk+1  - A Uk+1  + B UK+1  + A Uk  + B Uk  + (—  - l)Uk  + F 
u r,s  -1  r-l,s  -1  r,s-l  1 r+l,s  1 r,s+l  w rs  rs 

Because  the  basic  splitting  satisfies  block  property  A the  number  p(w) 
which  is  the  related  spectral  radius  satisfies  the  equation  (see  (151) 

2 2 2 

(p (w)  + u - 1)  - u p p(u)  . 

Thus,  having  determined  p,  we  know  p (u)  . 

5.  p LINE  METHOD:  I.  Again,  let  P be  a fixed  integer  and  assume 
that  (4.1)  holds. 

The  interior  grid  vector  is  arranged  into  subgrid  vectors  {u  ) of  Pp 
entrees  as  follows  J 

5.1.  - {V(J-l>pV  1^p'  • 

That  is,  U,  consists  of  the  values  associated  with  the  block  of  p 

lines.  We  now  have 

5.2)  0 - {U1,02,...,Uq}T  . 


Within  each  U.  the  ordering  is  the  same.  That  is,  let  G(j,w)  be  the  P 

J ^ ^ 

vector  associated  with  the  w horizontal  line  within  the  j block  of  p 
horizontal  lines,  i.e. 


28 8 


then 


Ul. (j-l)P+u 
U2, (j-l)p+U 


G(j.W) 


U0,  (j-Dp+U 


“P.tj-Up+wJ 


G(j,l)' 

G(j,2) 

G(j,p). 


5.3) 


The  discrete  Poisson  equation  (2.5a),  (2.5b)  now  takes  the  form 

roi  ‘ *Vi  * 4 f> 


where  T and  R are  pP  by  pP  matrices.  In  fact,  T is  block  tridiagonal 
with 


5.4a) 

and 

5.4b) 


T ’ '-VV-Vp 


Tp  - [-1,4, -up 


- [o  "] 


This  decomposition  is  used  to  make  the  splitting 


5.5) 

A 

- M - N 

where 

5. 6a) 

M 

- diagonal (T) 

5.6b) 

N 

- [r,o,rt]  . 

It  is  iunediately  clear  that  M - M and  is  positive  definite  since  each 
T is  positive  definite.  Once  more,  this  splitting  satisfies  block  property  A 
(see  [9],  [10]).  Thus  (A.l)  and  (A. 2)  are  satisfied.  From  the  structure  of  N 
we  see  that  (A. 3)  is  satisfied  with  N « 2. 
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Once  more  we  seek  to  determine  an  appropriate  q and  verify  (A. 4)  . 

Lemma  5.1:  Suppose  U is  a grid  vector  which  satisfies  (i)  and  (ii)  of  (A. 4) 
Then 

2 

5.7a)  < NU ,U > - - <U,U>  + E 

P 


where 


That  is,  (A. 4)  is  satisfied  with 


Proof : We  have 


Consider  a term 


|e|  £ 8B2p  . 


q - 2/p 


K ■ 8B  p . 


“ T 

<NU,U>  - l U. (NU)  • 

j-1  3 3 


uJtNujj  - u]'RUj.1  + u**TuJ+1  • 


ujRUi-i  " V(j-i)p+i  ' uo.(j-i)p 


fix  u,  1 < y < p.  Then 


where 


luo,  (j-Dp+l1  Iuo,  (j-l)p1  “ Iuo,(j-l)p+w  + €11  Iuo,  (j-l)pfu  + E21 


|Cjl  £ Bph 


Therefore,  averaging  once  more,  we  have 


]2  ♦ 4hpaB2 


Thus,  aa  in  section  4 


< U,NU > - - (U,U>  + E 
P 
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where 


|e|  < 8B2p  . 

Corollary:  If  we  consider  the  p line  iterative  method  described  by 

(5.3)-(5.6b) , then 

p - 1 - p*2h2  ♦ 0(h3)  . 


Proof:  Apply  theorem  3.1. 

Remark:  A careful  look  at  this  section  shows  that  K - 8B2ph  and  hence  we 
easily  obtain 

p « 1 - ps2h2  + 0(h4)  . 

6.  p-LINE  METHOD  II.  In  this  section  we  approach  the  p-line  method  of 
section  5 with  another  method  of  analysis.  The  results  obtained  are  weaker, 
but  the  approach  may  well  have  applications  in  cases  where  the  analysis  of 
section  3 does  not  apply. 


Lemma  6.1:  Let  u^(x)  denote  the  Chebychev  polynomial  of  the  second  kind  of 


order  n, 

uQ(x)  ■ 1,  u^x)  ■ 2x  and 

Un+1 

(x)  - 2xu  (x)  - u , (x) 
n n-i 

If  X 

> 1, 

then 

6.1a) 

u (x)  > u , (x)  + 
n - n-i 

1. 

n >_  1 

6.1b) 

u (x)  > n + 1, 
n — 

n ^ 1 

and 

6.1c) 

a , . v d 

— u (x)  > — u 
dx  n — dx  n- 

-1  M 

+ 2n  . 

Proof:  Apply  induction. 

Corollary: 

u*(x) 

n 

>_  2,  n >_  1 and  x ^ 1. 

Lemma  6.2: 

Let 

B - [-I,2S,-I]  where  S 

m 

and 

I are  n x n.  Let 

u.ts)  - Uj(S)  , 


that  is,  (S)  is  an  n * n 
Chebychev  polynomial,  at  S. 


matrix  obtained  by  evaluating 

If  U (S)  is  nonsingular,  then 
in 


V 


6.2) 


1 <S)U  4 <S>« 
m j “l  in— l 

m l-i  m-j 


Hi  . 
i < j . 


the 
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Proof:  See  Theorem  1 of  [3]. 


-1. 


The  quantity  of  interest,  p,  is  the  spectral  radius  of  M N.  Since 


-1  -1  -1  T 

M N - [T  R.O.T  R ] 

(see  section  5)  we  may  apply  Lemma  6.2  to  obtain  T * and  hence  M We 

find  that 

M_1N  - [D,O.E] 


where 


6.3a) 


D - 


o 


u_1u  , 

p P-1 

o‘V  , 

P P"2 


°p1uo  J 


6.3b) 


u'V 

P 0 

u-1u, 

p 1 


u_1u 

p P-1 


o 


and 

6.3c) 


°j  -Viv  • 


then 


If  Q denotes  the  unitary  matrix  which  diagonalizes  Tp  (and  hence  — Tp 


(6.4a) 


C'luj ( J Tp)C  - 0 (j  0_1TpQ) 


■ diag{Uj (X^) } , r ■ 1,2, ...,P  , 


and 


6.4b) 


°"lup1(2  VQ  " Up1(2  Q~\Q) 


» diag{up  (Xr ) > , r - 1,2, ...,P  , 
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where 


6.4c) 


“ 2 - i_.i (rih)  > 1 


i»  the  r'th  eigenvalue  of  — T . 

2 P 


Let 


we  see  that 


6.5) 


0 - <ii*g{Q,Q,...,Q} 


0 1N0  - IQ^DQ.O.C-1!*?]  . 


Thus,  applying  the  Gershgorin  circle  theorem 


6.6) 

~ P 

l<j<p 

Lenina  6.3i  If  x > 1,  then 
6.7) 


p < Bp  = max  {Up1(lr)[Up_j(Xr)  ♦ u^U^]}  * 


uo(x)  +up-i<x)  -uj  i(x)  + Up-j(x)'  j “ 1'2 P • 

Proof:  For  x ^ 1 and  i >_  2,  from  the  basic  recursion  formula  and  (6.1b) 
we  have 


Ui(X)  " Ui-l(x)  -ui-l(x)  “ ui-2(x)  • 


Thus,  if  n > m > 0 we  have 


Un(x)  ~ U„  , (x)  51  U (x)  - u , (X)  , 

n n-i  ■ m n~  1 


or 

6.8) 


« (x)  + « , (x)  > u (x)  + u , (x) 

n m-i  — m n-1 


Of  course  (6.7)  is  true  for  j " 1 and  j “ p.  Assume  that  (6.7)  is 
true  for  a value  of  j - o,  l^o<_p-2.  Then 


6.9) 


Vx)  + Vi(x)  - Vi(x)  * Vo,x)  • 


We  may  assume  o < p - o.  Then  applying  (6.8)  with  n - p - o and  m - o 
we  find 
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u (x)  + u (x)  > u ^..(x)  ♦ u (x)  . 
p-a  o-l  - p-(o+l)  o 


That  is 


p-o 


(x)  + u , (x)  > u . (x)  + u 


o-l 


“ 3 


P-(j+D 


(x)  . 


Substitution  of  this  result  into  (6.9)  gives  (6.7)  for  the  larger  value  of  j 
and  the  lemma  is  proven. 


Theorem  6.1:  With  B^  defined  by  (6.6)  we  ha/e 


p(M~1N)  < B 
- P 


6.10) 

Proof:  From  Lerrna  6.3  we  see  that 


- £ „2 *h2  + 0(h4)  . 


B ” max 

P » 


1 * Vl(V 

u (1  ) 

P r 


where  the  X^  are  given  by  (6.4c).  It  is  not  difficult  to  see  that 


B (X) 
P * 


1 + u , (X) 

£~\ 

u (X) 

P 


is  a monotone  non-decreasing  function  of  X for  X > 1.  Thus 


B - 

P 


1 + u (2  - cosuh) 

P-1 

u (2  - coswh) 

P 


Expansion  of  B^(X)  about  X » 1 yields  (6.10). 

Final  Remark:  Since  it  is  better  to  slightly  overestimate  the  relaxation 
parameter  in  successive  overrelaxation  than  to  underestimate  it,  it  might 
appear  that  the  estimate  of  this  section  is  preferable  to  that  of  section  5 
for  coarse  meshs . However,  numerical  experiments  contradict  this  idea. 
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ABSTRACT.  A major  premise  of  this  paper  is  that  the  high  cost 
and  low  reliability  of  software  are  due  to  lack  of  precision  and  systematic 
procedures  in  the  early  stages  of  the  software  development  cycle.  The 
greater  part  of  the  paper  is  devoted  to  a survey  of  some  techniques  for 
Software  Analysis  and  Design.  Some  underlying  principles  and  common- 
alities are  noted,  and  some  directions  for  needed  research  are  discussed. 
Since  the  paper  is  tutorial  in  nature,  a large  set  of  references  is 
included. 

1.  IN TRO DU C TI ON . Software  development  can  be  viewed  as 
consisting  of  three  phases:  Analysis,  Design,  and  Implementation 
(Figure  1).  Analysis  produces  a functional  architecture,  which  describes 
what  the  system  must  do.  Design  produces  a system  architecture, 
which  describes  how  it  will  be  done  by  determining  structure  (parts  and 
interfaces)  and  choosing  algorithms.  Implementation  produces  an 
operational  system. 


Figure  1.  The  System  Development  Cycle 


Analysis  is  part  of  the  broader  activity  of  Requirements  Defini- 
tion. Requirements  Definition  also  consists  of  context  analysis  and  the 
specification  of  design  constraints  [RS77].  However,  we  will  use  both 
terms.  Analysis  and  Requirements  Definition,  to  refer  to  essentially 
the  same  stage  of  development.  Similarly,  Implementation  incorporates 
Programming,  as  well  as  other  activities,  but  we  will  use  these  two 
terms  more  or  less  interchangeably. 

Much  attention  has  been  paid  to  the  programming  phase,  including 
research  and  development  efforts  in  structured  programming  [D72, 
McGK75,  McG75],  language  design  for  reliable  software  [Wo77],  and 
programming  methodology  [Gr77].  Much  less  work  has  been  done  toward 
improving  the  analysis  and  design  phases.  This  paper  discusses  a few 
analysis  and  design  methods  that  have  emerged  over  the  past  five  or  so 
years.  This  survey  should  serve  as  an  introduction  for  those  wishing 
an  overview  of  the  area. 

Before  beginning  with  the  discussion  of  the  methods,  we  present 
some  facts  concerning  software  costs;  the  evidence  concerning  life  cycle 
costs  explains  our  emphasis  on  the  early  stages  of  software  development. 
Then,  the  following  section  of  the  paper  discusses  the  following  methods: 

• Jackson  Design  Technique 

• Structured  Design  Method 

• HIPO 

• SADT 

• PS  L/ PSA 

• SREM 

Although  none  of  these  methods  covers  the  full  range  of  software  develop- 
ment problems,  each  method  offers  important  concepts  which  aid  our 
understanding  of  various  aspects  of  the  problems. 

SOFTWARE  COSTS  AND  RELIABILITY.  Reliability  is  con- 
cerned with  the  probability  of  failure  of  a system  and  the  impact  of  the 
failures  [G78|.  However,  for  now  we  will  not  give  the  term  "reliability" 
a precise  mathematical  meaning.  A system  will  be  deemed  reliable  if 
certain  qualitative,  empirical  criteria  are  met:  few  failures  with  only 
low  impact,  few  errors  found,  low  number  of  man  hours  spent  on  main- 
tenance, high  user  satisfaction,  and  so  on.  It  is  in  this  sense  that  we 
hope  methods  with  the  intentions  of  the  ones  surveyed  here  will  contrib- 
ute to  the  reliability  of  software. 

Improving  reliability  is  closely  tied  to  costs;  that  is,  the  costs 
incurred  are  directly  related  to  the  effort  Involved  in  achieving  and 
maintaining  an  acceptable  level  of  reliability.  Several  recent  studies 
have  indicated  what  the  true  costs  of  software  are  and  where  these  costs 
are  primarily  incurred.  First,  the  realization  that  maintenance  con 
can  consume  about  75  percent  of  software  effort  f 13 76 ] has  motivated  a 
concern  for  total  life  cycle  costs,  not  just  development  costs.  Second, 
detecting  and  correcting  errors  have  been  responsible  for  almost  half 
of  life  cycle  costs  [A76j.  Third,  analysis  and  design  errors  are,  by 
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far,  the  most  costly  and  crucial  types  of  errors  [B75].  The  conclusion 
we  draw  from  these  studies  (and  from  our  own  experience)  is  that  a 
larger  proportion  of  the  development  effort  should  be  spent  on  analysis 
and  design  in  order  to  reduce  life  cycle  costs.  For  preventing  errors 
is  much  cheaper  than  correcting  errors  that  have  been  built  in.  And  a 
well  structured  and  documented  solution  to  a clearly  defined  problem  is 
the  legacy  system  operation  needs  to  understand  and  to  change  the  exist- 
ing system  with  confidence. 

3.  SOME  ANALYSIS  AND  DESIGN  TECHNIQUES.  Each  of  the 
techniques  we  will  discuss  can  be  characterized  with  respect  to  several 
properties: 

• structuring  principles,  which  provide  a way  of  conceptu- 
alizing  the  system  as  parts  and  the  relationships  between 
them. 

• graphical  description  technique,  a language  of  boxes  and 
arrows  that  makes  explicit  the  notions  of  structuring 
being  used. 

• procedure  for  creating  descriptions,  which  vary  in  how 
prescriptive  or  formal  they  are. 

• emphasis  on  analysis,  design,  or  implementation. 

• adequacy  or  "goodneaa"  criteria,  for  example,  how  to 
judge  the  goodness  of  a design,  or  how  to  tell  whether 
a set  of  requirements  is  "complete"  or  "consistent.  " 

• pragmatics  which  deal  with  how  well  and  on  what  range 
of  projects  the  various  techniques  have  worked  in  prac- 
tice; this  involves  not  only  where  in  a phased  approach  a 
technique  was  especially  applicable  but  also  how  easily 
managed  and  transferred  to  new  group  they  proved  to  be. 

4.  THE  JACKSON  TECHNIQUE.  The  Jackson  Technique  has 
many  adherents,  especially  in  the  United  Kingdom  and  Europe.  There 
is  even  a Jackson  Method  User  Group  with  regular  meetings.  The 
primary  reference  is  [J75]. 

The  technique's  basic  structuring  principle  is  stated  by  Jackson 
as  follows:  "program  structure  should  mirror  problem  structure.  " 
Although  we  often  hear  such  a principle  espoused,  Jackson  means 
something  very  simple  and  specific  by  it  in  practice. 

By  the  "structure  of  the  problem,  " Jackson  means  the  structure 
of  the  input  and  output  data.  The  technique  is  often  used  for  data 
processing  programs  for  which  the  data  structures  (or  file  structures) 
are  well  understood  and  easily  stated.  Structuring  is  based  on  a 
context-free  grammar  extended  with  the  Kleene  star  (*)  notation. 

Three  primitive  constructs  are  used  to  represent  data  structure 
(Figure  2a). 
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The  first  construct  represents  the  decomposition  of  a data  item 
into  a sequence  of  sub-items.  In  the  example  in  Figure  2a,  A consists 
of  ' B followed  by  C followed  by  O'.  The  iteration  construct  shows  zero 
or  more  repetitions:  A consists  of  a sequence  of  zero  or  more  B's. 
Selection  is  denoted  by  a superscript  zero:  A consists  of  either  R or 
C or  D.  Arbitrarily  complex  hierarchical  structures  can  be  constructed 
using  the  three  basic  notations  (Figure  2b). 

A program  structure  can  be  represented  in  the  same  notation  as 
can  data  structure.  In  fact,  the  three  constructs  correspond  to  the 
three  basic  structured  programming  constructs,  SEQUENCE,  DO-WHILE, 
IF-THEN-ELSE  (or  CASE).  Therefore,  any  program  structure  repre- 
sented with  Jackson's  constructs  has  a direct  analog  in  structured  code. 

Figure  3 shows  the  input  and  output  data  structures  for  a system 
that  keeps  a record  of  the  movement  of  parts  through  a company's 
stockroom.  The  input  file  consists  of  a series  of  part  groups,  each 
part  group  consists  of  a series  of  movement  records,  and  a movement 
record  is  either  an  "issue"  or  "receipt"  record.  The  output  is  a report 
consisting  of  a heading,  followed  by  a body  which  consists  of  several 
lines  of  output,  each  line  corresponding  to  a Part  Group. 

Program  design  involves  finding  a program  whose  structure  is 
compatible  with  both  the  input  and  output  structures.  A program  struc- 
ture compatible  with  the  files  of  Figure  3 is  shown  on  Figure  4.  Dis- 
covering the  program  is  the  creative  step  in  the  design  process,  like 
the  crafting  of  an  apt  loop  invariant  in  structured  programming  [D72, 

D76,  McGK75].  It  is  clear  that  the  program  can  treat  both  the  input 
and  output  data  structures  as  sequential  files  and  perforin  its  function 
in  a straightforward  manner. 

Of  course,  it  is  not  always  possible  to  find  a program  structure 
that  is  compatible  with  both  the  input  and  output  structures.  This  situa- 
tion, called  a 'structure  clash,  ' is  resolved  by  the  creation  of  an  inter- 
mediate file.  Then  it  is  sufficient  to  find  two  subprograms,  one  that 
is  compatible  with  the  input  and  the  intermediate  file,  and  one  that  is 
compatible  with  the  intermediate  file  and  the  output.  For  an  example 
in  more  detail,  see  [J75], 

5.  STRUCTURED  DESIGN.  The  Structured  Design  method 
resulted  from  ten  years  of  research  at  IBM  by  Constantine  and  others 
[SMC74],  Its  most  notable  characteristic  is  the  provision  of  specific 
qualitative  criteria  for  judging  the  "goodness"  of  a design. 

A "module"  is  defined  as  a named  set  of  contiguous  program 
statements.  A "good"  module  is  cohesive.  Cohesion  is  a measure  of 
intramodule  strength.  The  possible  levels  of1  cohesion,  in  decreasing 

1 order  of  cohesiveness,  and  therefore  in  order  of  diminishing  desira- 

bility, are  shown  in  Figure  4.  The  most  desirable  level  is  functional; 
— 
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Figure  2.  Structure  Notations 


1. 


Functional:  module  performs  a single  specific  function  — 
"write  a record  to  output  file" 


2.  Clustered;  module  is  a group  of  functions  sharing  a data 
structure  usually  to  hide  its  representation  from  the  rest 
of  the  system:  only  one  function  is  performed  per 
invocation  — "symbol  table  with  insert  and  look-up 
functions" 

3.  Sequential:  module  action  comprises  several  functions 
that  pass  the  data  along  — "update  and  write  a record" 

4.  Communicational;  module  action  consists  of  several 
logical  functions  operating  on  some  data  — "print  and 
punch  a file" 

5.  Procedural:  module  elements  are  grouped  for  algorithmic 
reasons  — "loop  body" 

6.  Temporal:  module  functions  are  all  related  in  time  — 
"initialization" 

7.  Logical:  module  can  perform  a general  function  (i.e.  , 
several  logically  related  functions):  an  invoking 
parameter  value  determines  the  specific  function  — 
"general-error -routine " called  with  an  error-number 

8.  Coincidental:  no  real  relationship  between  module 
elements  that  are  grouped  for  packaging  considerations 
"common  subexpression" 


Figure  4.  Cohesion  Levels 
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A complementary  criterion  is  coupling.  Modules  are  formed 
so  as  to  minimize  the  connections  between  modules.  The  levels  of 
coupling  are  shown  in  Figure  5,  in  order  of  increasing  module  inter- 
dependency. Data  coupled  modules  are  the  most  loosely  coupled  and, 
hence,  most  desirable.  Content  coupled  modules  are  the  least  desirable. 

Using  the  Structured  Design  criteria,  a design  can  be  assessed 
and  reworked  to  improve  its  quality,  to  enhance  its  expected  performance, 
or  to  package  it  better  for  its  target  environment.  Some  design  quality 
might  be  sacrificed  as  a result  of  such  engineering  tradeoff  analyses, 
but  at  least  Structured  Design  has  made  these  decisions  and  their  quali- 
tative costs  more  explicit. 

Besides  an  evaluative  framework,  Structured  Design  prescribes  a 
methodology  for  creating  a design  (i.e.  , a system  architecture).  The 
first  step  is  to  describe  the  data  flow  characteristics  of  the  system 
using  data  flow  charts,  in  which  the  nodes  represent  functions  and  the 
arrows  represent  the  flow  of  data. 

The  next  step  is  to  decide  where  the  "top"  of  the  system  should 
be.  If  the  program  had  to  be  specified  as  three  steps,  1)  obtain  input, 

2)  perform  tranformation,  3)  emit  output,  how  should  the  data  flow 
diagram  be  partitioned9  This  process  is  called  "finding  the  most 
abstract  form  of  the  input  and  output,"  shown  here: 


(Two  modules  are  coupled  if  . . . ) 

1.  Data:  all  communication  between  them  is  via  arguments 
that  are  data  elements 

2.  Stamp:  their  communication  includes  an  argument  that 
references  a data  structure  (some  of  whose  fields  are 
not  needed) 

3.  Control:  an  argument  from  one  knowingly  influences  the 
flow-of-control  of  the  other,  e.  g.  , flag 

4.  External:  they  reference  an  externally  declared  data 
element 

5.  Common:  they  reference  an  externally  declared  (i.e.  , 
command  data  structure  (some  of  whose  fields  are  not 
needed) 

6.  Content:  one  references  the  contents  of  the  other 


Figure  5.  Coupling  Levels 
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The  methodology  hereafter  consists  of  applying  this  procedure 
(recursively)  to  the  Obtain,  Perform,  and  Emit  parts  in  order  to  obtain 
a hierarchical  program  structure.  At  each  level,  each  of  the  three 
(abstract)  modules  is  treated  as  a subproblem.  Hierarchical  structure 
charts  are  used  to  document  the  resulting  design. 

Structured  Design  is  eminently  transferable,  especially  since  the 
morphology  for  some  standard  designs  has  already  been  identified  [YC75], 
Hughes  Aircraft  has  extended  the  method  to  obtain  additional  guidelines 
for  designing  real-time  military  systems  such  as  software  for  radar 
systems  [Je75], 

6.  HIPO.  HIPO  (hierarchy  plus  input-process -output)  was 
originally  developed  as  a software  design  aid  and  documentation  tech- 
nique [H74,  Ka76,  S76].  It  has  become  a documentation  standard  in 
some  of  IBM's  program  logic  manuals.  Recently  there  have  been 
attempts  to  use  HIPO  in  developing  requirements  specifications  [jo76], 

A HIPO  package  consists  of  a visual  table  of  contents  (Figure  6), 
overview  HIPO  diagrams,  and  detail  HIPO  diagrams.  The  visual  table 
of  contents  is  a tree  structure  that  gives  the  hierarchy  of  the  HIPO 
diagrams.  Overview  diagrams  form  the  upper  levels  and  are  concerned 
with  general  functions,  inputs  (files),  and  outputs.  Detail  diagrams 
form  the  lower  levels  and  are  concerned  with  subfunctions,  specific 
inputs  (records,  fields),  outputs,  and  internal  data  flow.  Detail  dia- 
grams often  have  an  extended  description  section  which  allows  for 
additional  notes,  details,  etc.,  and  contains  references  to  the  corre- 
sponding code  (module  names,  labels,  etc.).  The  diagrams  are  suitably 
cross -referenced. 


Figure  6.  HIPO  Table  of  Contents  Showing  Hierarchy 
of  System  Functions 
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A HIPO  diagram  (Figure  7)  has,  in  left  to  right  order,  an  input, 
a process,  and  an  output  section.  Each  diagram  refines  a single  func- 
tion into  subfunctions,  which  are  listed  as  steps  in  the  process  section. 
Inputs  and  outputs  are  placed  in  the  appropriate  sections.  Arrows 
connect  process  steps  with  their  inputs/outputs.  A HIPO  diagram 
primarily  shows  function  and  data  flow,  but  the  process  section,  espe- 
cially of  detail  diagrams,  can  show  a limited  amount  of  control  flow. 

A recurring  problem  in  the  use  of  HIPO  has  been  in  the  specification  of 
interfaces  between  functions  in  the  hierarchy.  The  notation  does  not 
mandate  the  matching  of  interfaces  or  even  prescribe  a specific  standard 
form  for  them. 

7.  SADT.  SADT®  — Structured  Analysis  and  Design  Technique  — 
is  an  integrated  methodology  for  requirements  definition  and  the  design 
and  specification  of  software  systems.  It  has  been  applied  to  a variety 
of  complex  system  problems  [BM77]  including  military  training  [ST76], 
a government  agency's  financial  management  system,  software  design 
for  a large  data  base  system  and  for  a PABX  telephone  system,  and 
computer-aided  manufacturing  [AF74],  "ITT  Europe,  for  example,  has 
used  SADT  since  early  1974  for  analysis  and  design  of  both  hardware/ 
software  systems  (telephonic  and  telegraphic)  and  nonsoftware  people - 
oriented  problems  (project  management  and  customer  engineering)" 

[RS77]. 

SADT  is  built  on  a graphical  language  [R77]  whose  primary  struc- 
turing principle  is  hierarchical  decomposition  (Figure  8).  The  basic 
unit  of  expression  in  SADT  is  the  box  (Figure  9)  which  represents  an 
activity.  Arrows  representing  data  interfaces  connect  boxes  to  form 
diagrams.  A typical  SADT  diagram  is  shown  in  Figure  10,  which 
describes  the  design  phase  of  the  software  development  process.  Each 
box  in  the  diagram  represents  an  activity  that  is  part  of  the  containing 
design  activity.  The  arrows  are  all  data  that  show  how  the  activities 
are  interrelated.  An  activity  box  transforms  the  input  (arrow  entering 
from  the  left)  into  an  output  (arrow  leaving  from  the  right)  under  cir- 
cumstances imposed  by  control  (arrow  entering  at  the  top).  The 
unconnected  arrows  are  interfaces  between  "design"  and  the  other 
phases  of  software  development. 

An  SADT  system  description  is  a hierarchically  organized  set 
of  diagrams  each  representing  a limited  amount  of  detail  (Figure  11). 

The  rules  of  language  usage  enforce  unfolding  the  system's  structure  a 
piece  at  a time  from  the  top  down.  An  SADT  model  is  a set  of  diagrams 
that  describe  a system  in  some  bounded  context  from  an  identified  view- 
point and  for  a particular  purpose.  An  SADT  system  description  usually 
consists  of  multiple  models  from  different  viewpoints  (e.g.  , user, 
manager,  implementor,  etc.).  These  models  are  interconnected  where 
details  are  shared  (Figure  12).  The  process  of  interconnecting  or 
"tying”  models  together  is  a major  vehicle  for  checking  the  consistency 
and  completeness  of  the  system  description. 

[SADT  is  a trademark  of  SofTech,  Inc.  ] 
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Figure  9.  The  SADT  Unit  of  Expression 


Figure  10.  The  Design  Process 


Figure  11.  SADT  Model 
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MULTIPLE  MODELS  ARE 
REQUIRED  TO  REPRESENT 
THE  MULTIPLE  VIEWPOINTS 
OF  REQUIREMENTS  DEFINITION 


Figure  12.  Interconnected  SADT  Models 


The  discussion  of  activity  modeling  in  SADT  is  only  half  of  the 
story.  Data  models  are  also  part  of  the  SADT  methodology.  A data 
model  shows  the  top-down  decomposition  of  the  data  aspects  of  a system. 
The  notation  for  data  modeling  is  exactly  dual  to  that  for  activities:  the 
boxes  represen;  data  and  the  arrows  represent  activities  that  create, 
consume,  and  use  the  data.  Tying  together  activity  and  data  models 
enforces  further  checks  on  the  system's  integrity. 

Most  important  system  concepts  such  as  feedback,  data  flow, 
sequencing,  state  transition,  and  component  interconnection  can  be 
expressed  easily  in  clear  SADT  graphic  form.  Design  principles  such 
as  top-down  decomposition,  modularity,  coupling  and  cohesion,  stepwise 
refinement,  abstract  data  types,  monitors,  levels  of  abstraction,  and 
information  hiding  are  recognizable  and  directly  expressible  within 
SADT. 

In  addition  to  the  use  of  a graphic  notation,  the  SADT  methodology 
prescribes  a system  development  procedure  with  well-defined  personnel 
roles  [Sof76].  The  methodology  imposes  a structure  on  the  system 
development  process  itself,  not  just  on  the  resultant  system  [GM77j. 

H.  PSL/ PSA,  The  Problem  Statement  Language /Problem 
Statement  Analyzer  (PSL/PSA)  system  is  the  first  major  computer 
support  tool  oriented  toward  helping  the  system  analyst.  As  Boehm 
notes:  "It  is  the  only  system  to  have  passed  a market  and  operations 
test;  several  commercial,  aerospace,  and  government  organizations 
have  paid  for  it  and  are  successfully  using  it.  The  U.  S.  Air  Force  is 
currently  using  and  sponsoring  extensions  . . . under  the  Computer 
Aided  Requirements  Analysis  (CARA)  program"  [Bo76],  PSL/PSA  was 
developed  as  part  of  the  ISDOS  (Information  System  Design  and  Optimiza- 
tion System)  research  project  at  the  University  of  Michigan  under  the 
direction  of  Professor  Daniel  Teichroew.  It  is  coded  largely  in  ANSI 
FORTRAN  and  "is  operational  on  most  larger  computing  environments 
which  support  interactive  use"  [TH77], 

System  descriptions  are  expressed  using  PSL  and  entered  into  a 
data  base  by  PSA.  PSL  system  descriptions  involve  "objects"  (of 
which  there  are  more  than  20  permissible  types  such  as  input,  inter- 
face, process,  set,  etc.)  and  "relationships"  (such  as  part  of,  derive, 
update,  etc.)  between  objects.  Objects  can  be  tagged  with  "properties" 
(i.e.,  statements  describing  an  object).  A large  number  of  reports 
can  be  extracted.  They  include:  Formatted  Problem  Statement  giving 
all  properties  and  relationships  for  a specific  object.  Extended  Picture 
depicting  data  flows  in  graphical  form,  Structure  Report  presenting 
hierarchical  composition  information,  plus  other  summary,  change, 
and  data  base  analysis  reports. 

9.  SREM.  TRW  working  for  the  U.  S.  Army  Ballistic  Missile 
Defense  Advanced  Technology  Center  (BMDATC)  has  developed  a 
significant  extension  to  the  PSL/PSA  approach  [A1B7(\  A177,  Be77, 
DV77]  called  Software  Requirements  Engineering  Methodology,  SREM. 

It  is  addressed  explicitly  to  the  real-time  process  control  types  of 
problems  represented  by  ballistic  missile  defense.  The  system  has 
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been  implemented  on  an  Advanced  Scientific  Computer  and  is  being 
transferred  to  a CDC  7600.  It  supports  multi-color  graphics  I/O, 
does  requirements  consistency  checks,  has  a general  query  capability, 
and  allows  for  system  simulation  executions  to  be  extracted. 

As  an  approach  to  analysis,  SREM  regards  messages  and  entities 
as  the  system  outside  (which  is  equated  with  system  "top").  After  these 
are  identified,  the  necessary  processing  path  for  each  message  is  defined. 
Paths  connected  to  a common  interface  are  combined  into  a requirements 
network,  R-net.  System  level  requirements  are  mapped  onto  the  various 
processing  paths.  To  guarantee  testable  requirements,  validation  points 
are  located  appropriately  along  processing  paths.  SREM  then  attempts 
to  demonstrate  analytic  feasibility  by  simulation.  The  input  Require- 
ments Statement  Language,  RSL,  is  based  on  4 primitives:  Elements 
(such  as  message,  interface,  etc.),  Attributes  (such  as  maximum  value, 
entered  by,  etc.),  Relationships  (such  as  composes,  connects  to,  etc.), 
and  Structures  (representing  the  2 -dimensional  R-nets). 

10.  SUMMARY  AND  CONCLUSIONS.  Figure  13  shows  where  in 
the  system  development  process  the  focus  of  each  method  is.  In  our 
survey  we  started  toward  the  "detailed  design"  end  of  the  scale  and 
moved  toward  methods  more  applicable  in  the  early  requirements  defi- 
nition phases. 

Techniques  for  design  and  implementation  are  concerned  more 
with  the  world  of  concrete  objects,  machines,  details,  and  mathematical 
models.  The  other  end  of  the  scale  is  concerned  with  the  world  of  more 
abstract  objects,  people,  generality  and  imprecise  concepts.  The 
requirements  definition  area  sometimes  seems  to  defy  technical  analy- 
sis and  even  tends  toward  the  philosophical.  This  raises  a challenge  to 
software  engineering  researchers  of  how  to  represent  "high  level" 
knowledge  about  systems  in  precise  and  useful  ways.  Such  research 
is  in  progress  but  much  more  is  needed. 
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Figure  1 3.  Applicability  of  Methods 
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